Closed ghannum closed 6 years ago
I see RG:Z:foo
in a toy example I just ran with ngmlr 0.2.7 (installed from bioconda). I confirm that the help text contains the described bug.
I get help text in the header, but not for each read individually. Did you get both?
Hi Ghannum,
could you please send me the command you used to run ngmlr?
Thanks, Philipp
ngmlr 0.2.7 (build: Jun 25 2018 10:17:59, start: 2018-07-17.15:49:31)
ngmlr -r ref.fasta -q reads.fastq.gz --rg-id SAMPLEID --rg-sm SAMPLEID -t 32 -x pacbio | samtools view -bS - | samtools sort -@ 32 - -o SAMPLEID.pacbio.bam
I get the same issue without all of the samtools piping...
Thanks! Using this parameters, ngmlr 0.2.7 gives me the following:
@HD VN:1.0 SO:unsorted
@SQ SN:NC_001416.1 LN:48502
@PG ID:ngmlr PN:nextgenmap-lr VN:0.2.7 CL:ngmlr -r ref.fa -q read.fastq --rg-id SAMPLEID --rg-sm SAMPLEID -t 1 -x pacbio
@RG ID:SAMPLEID SM:SAMPLEID
5a68e600-dcdd-4ef1-8a1f-0fa11dc7fffb 0 NC_001416.1 26539 60 32S20M1I1M1I40M3I5M1D47M1I59M2D8M1D1M1D13M2D12M2D34M1I28M1D10M2D12M1I31M2I46M4I63M1D6M1I16M1I22M1D9M2I21M1I9M1D5M2D23M2D24M1D2M1D21M3D54M1I44M2D10M1D37M1D1M1D10M1D11M1D75M2I3M1D39M3D15M1D31M2D21M1D21M1I14M1I4M1D23M1D6M1D42M20S * 0 0 TTGTTATTGCTTCGTTCAGTTACAGTATTGCTCTGTCCATGAATTTCTGAAAGGAAGTTACCCCTCTAAGTAATGAGGTGGTAAGGACGCTTTCATTTTTTTCGTGTCGGCTAATCGATTTGGCCATACTACTAAATCCTGAATAGCTTTAAAGAAGGTTATGTTTAAAACCATCGCTTAATTTGCTGAGATTAACATAGTAGTCAATGCTCACCTAAGAAAAAACATTTCAGAGTTGACTGAATTTTATCTATTAATGAATAAGTGCTTACTTCTTCTTTTTTGACCTACAAAACCAATTTTAACATTCCGATATCATTTTTCACCGTAACTCATCGAAGACAGTAAGATAAAACATTATGGCACAAAGGAATAGTCATTTCAACCATCTGCTCGTAGGAATGCCTTATTTTTTTTTTTCTACTGCAGGAATATACCCGCCTCTTTCAATAACACTAAACTCCAACATATAGTAACCTTAATTTTTATTAAAATAACCACCAATTTATTTGGCGGCAACACGGATCTCTCTTTTTTAAGTTACTCTCTATTACGCCACGTTTTCATCTAAAGTGGTAGTATTGAACTTAACGGAATCGTATTGTAGTTTTCCATATTGCTTTCTGCTTCCTTTTGGATGTACTGTTATTCATGTTGCATGGTGCACTGTTTATACCAACGATATAGTCTATTAAATGCATATATGGTATCGCCGAACGATTAGCTCTTCAGGCTTCTAGAAGCGTTTAAGTACTAATAAGCCGATAGATAGCCACGGACTTCATGTATTTTTCATAGTCTTAACTTCGCTCCTCGCTCATAACAGACATTCACTACAGTTATGGCGGAAAGGTATGCATGCTGGGTGTGGGGAAGTCGTGAGAAAGAAAGAAGTCAACTGCGTCGTTTGACATCACCGCTATCTTACTGGTTATGCAGGTGTAGTGGGTAGCACACAAAGCTTTGCACTGGTGCGAGGCTTTGTGCTTCTCTGAGTGCGACAGGTTTGATGACAAAAAAATTAGCGCAGAGGAACAAAAATCACCTTGCGCTAATGTAAATTTGGGTCACTAATACCATCTAAGTAGTTGATTCATAGTGACTGGATGGCAATACGTAACTACC &$+*+(5'%(@+=669>375A,0*$+&*+&).=??>07//(1+7G//@393(-,*,+/H22FG8-934>4-;-,-527-./0.71>30-+AA/.=B;893-**$--774<<8;DDA>@0E5-45?9EBFB<AAAG>=:630),&,&*/874@++2633.3.449G?;FJDCBAAE1.</9-A03+.0C22272491A@:7797:D;G/,,,'(40)'/*2>EEF712?J8:0*(/8>6?A9751:0-83??C9:0;104HIFEA<861=112;GAAFAE<IHJ,/0<>98-97D><CC8AFF:ACD:C2/+()/,/299@HH;1;>@*7(%*-@=A7&.*&,)&*,7B>A?AGG5/-7.%++.+&)>F7273>=<CEC7?D-90C/;2/062-00AB?FE?E?@F4F00GBCBHEDF@A-80',)*4464:64A<79C739A<0DF0,@/.AF>BCDFGD653641037;?A40;0/(''60;FJF3;4H7JHD/8558.636.51.'8B):*,:3/9.1-0.4166656*;6;>@3510/<00248EDDBE25''&(,26@F753:@==>4*2'),007%1&&+;40D2F22--()+::13../019IJ83,1B5=?1.2D:251-6A9=8IH<@G96%*'+*)11&0255E9AE:?DD@5724(.00@G635=:C<F45549I7?B<>;?<07)-,**/15CA*=<;?EBCEBBF?860A)./ADE96&4=6;<5+''2+))0<A031)*.328739B3@/-/-,'/05*-4)..7>>=A.%,*184IJJG3;;150%++)(+013/48=9.6/71<-05=6&+,?2>>>=@A<D(3171/C11@B9C05<=??B?AA?BBAF;4+6'-2*1,-456<)*&2<3)-85',0-/-4(.//2467:@F@7;6974/3,-.9621.7)22>7G689&A+<58*-.%6.3..'58(9;==A3.22:;,0.74*'9,.3-4714;30>56>@>>=0*)+,8;DDF)?/=>.*1-,,-4=?FEA820/.019://1-5-/>GII5/34ED<E78547210.25<,51(.5+++5766C;:=DB@@>?C9<:<>B;9>?:23/.-.'(%'+(/-602A:9;><>+0)('% RG:Z:SAMPLEID AS:i:1568 NM:i:95 XI:f:0.9148 XS:i:0 XE:i:1568 XR:i:1073 MD:Z:46T19^A0A105^TT8^G1^A13^GG12^TT62^T10^GC10A1G6A21G1A17C90^C22G21^A30A0T7^C5^AA3T1A17^GG2C21^T2^A21^AAC0C0C63A32^GA10^C35G1^A1^C0C9^A3G7^C78^A10G19T8^CTT15^C9G21^AT21^G37A1^G23^C1C0T0G2^A0C0A40 SV:i:2 QS:i:32 QE:i:1105 CV:f:95.377777
The header contains the RG-ID and the additional information (which in this case is only RG-SM) and the reads have the RG tag with the RG-ID referencing the information in the header. This is what the SAM specification suggest and as far as I know what other read aligners do as well.
Are you not getting the RG:Z tags in the alignments?
Thanks, Philipp
Just to clarify @ghannum do you see any RG:Z:
Tag in the entries of the reads?
You can check with samtools view your.bam | less
Thanks Fritz
Correct - I am not getting any RG:Z tags in the alignments.
My header has: @RG ID:SAMPLEID SM:SAMPLEID
But my reads look like this: m54114_180515_083632/6684916/6689_6758 16 ref_chr_4 2040455 0 39S3M1I19M7S * 0 0 ACCACACAGATCTGGGAGATTCTCTGCAAAAAACAACAACGGAGGAGAGGAAAAGCGGGGGGAAGAGTC !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! AS:i:32 NM:i:2 XI:f:0.913 XS:i:0 XE:i:32 XR:i:23 MD:Z:7G14 SV:i:0 SA:Z:ref_chr_1,4831832,-,41S19M1D3M6S,0,2; QS:i:39 QE:i:62 CV:f:33.333332
Could you please try to either pull the newest commits from github and rebuilding from source or using ngmlr 0.2.7 installed from conda and confirm you still see the same problem.
Unfortunately, I can't reproduce it myself.
I'm running from the binary compiled as part of the 0.2.7 release: https://github.com/philres/ngmlr/releases/download/v0.2.7/ngmlr-0.2.7-linux-x86_64.tar.gz
According to git, the only changes from here are in the README.
Another issue you might look at (and might be related) is in the help text. Here is what I see at the top... notice the wrong description for the '-o' option.
ngmlr --help ngmlr 0.2.7 (build: Jun 25 2018 10:17:59, start: 2018-07-17.10:06:00) Contact: philipp.rescheneder@univie.ac.at
Usage: ngmlr [options] -r
Input/Output:
-r
I tried it again with the version from here (https://github.com/philres/ngmlr/releases/download/v0.2.7/ngmlr-0.2.7-linux-x86_64.tar.gz) and still get RG:Z tags for all reads.
Would it be possible to send me a read file with a couple of reads that cause the problem + the reference or information on what your reference genome is? I would be very surprised if this differed between datasets, but it is the last thing I can think of.
Thanks for pointing out the error in the help text. I just corrected it in the source code. However, this is unrelated to the behavior you are seeing.
The data and reference are proprietary - and I'm not sure if I can anonymize it without compromising the test. Do you have a public ref & small fastq I could try to match your end?
here is a sample that I just generated and tested. I reinstalled ngmlr over conda and run it like this:
ngmlr -r ref.fa -q tmp.fa --rg-id SAMPLEID --rg-sm SAMPLEID -o tmp.sam
The tmp.sam included the header and entry for the read groups per read.
Please let us know if that works for your installation. Thx Fritz ref.fa.gz tmp.fa.gz
aha! Your data and command line produce the expected output. After playing with it a bit, I found the way to reproduce the bug is with the multithreaded '-t 32' option. When it is off I get the RG code, when it is on I do not.
Ok thanks. We will look into this. Cheers Frits
Ok this was a tricky one. Thanks a lot for reporting this. I think I found the problem and it works for me now. I have attached a binary to this comment. Could you please try running the binary with your data and verify that it works now as expected?
Thanks, Philipp
Works great for me. Thanks for your help!
Hi Philipp,
Are you planning on making a new release with these changes?
I'll take care of the bioconda recipe then.
Wouter
Was there ever a new release with this fix? The most recent tag is 0.2.7 and predates this bug fix.
Not that I am aware of.
yeah sorry Philipp was the one that developed this and I currently dont have the resources.. Hopefully in the near future.
The help menu describes the use of --rg-id as "Adds RG:Z: to all alignments in SAM/BAM".
This is not the case - no information is added to the alignments themselves. This would be a useful option if it worked properly.
Also, the help menu appears to have this text for the description of '-o', which is a bug.