lbcb-sci / graphmap2

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html https://www.biorxiv.org/content/10.1101/720458v1
MIT License
67 stars 6 forks source link

ERROR: CIGAR and query sequence are of different length #1

Open michieitel opened 5 years ago

michieitel commented 5 years ago

Hi!

I am getting an error when converting the graphmap2 (v 0.6.0) generated sam file (of nanopore cDNA reads mapped to a reference assembly) to bam.

These are the logs of the run:

[22:32:53 BuildIndexes] Loading reference sequences. [22:32:54 SetupIndex_] Loading index from file: '/home/cgarcia/cbas_cDNA/data/CBAS_MASURCA-2_final.genome.scf.fasta.gmidx'. [22:33:00 Index] Memory consumption: [currentRSS = 4364 MB, peakRSS = 4804 MB] [22:33:00 Run] Hits will be thresholded at the percentil value (percentil: 99.000000%, frequency: 139). [22:33:00 Run] Minimizers will be used. Minimizer window length: 5 [22:33:00 Run] Reference genome is assumed to be linear. [22:33:00 Run] One or more similarly good alignments will be output per mapped read. Will be marked secondary. [22:33:00 ProcessReads] All reads will be loaded in memory. [22:33:53 ProcessReads] All reads loaded in 48.67 sec (size around 11231 MB). (9526475634 bases) [22:33:53 ProcessReads] Memory consumption: [currentRSS = 19168 MB, peakRSS = 19168 MB] [01:39:48 ProcessReads] [CPU time: 890914.56 sec, RSS: 501212 MB] Read: 11313268/11313268 (100.00%) [m: 9022586, u: 2290682]
[01:40:08 ProcessReads] Memory consumption: [currentRSS = 497484 MB, peakRSS = 502752 MB]

[01:40:08 ProcessReads] All reads processed in 890935.00 sec (or 14848.92 CPU min). [W::sam_parse1] mapped query cannot have zero coordinate; treated as unmapped [E::sam_parse1] CIGAR and query sequence are of different length [W::sam_read1] Parse error at line 14814 [main_samview] truncated file. [bam_sort_core] merging from 0 files and 80 in-memory blocks...

my commands were as follows:

Read alignment

graphmap-not_release align -t 80 -x rnaseq -K fasta -L sam --extcigar \ -r /home/cgarcia/cbas_cDNA/data/CBAS_MASURCA-2_final.genome.scf.fasta \ -d /home/cgarcia/cbas_cDNA/data/cbas_CANU_cDNA_correction-2_combined.correctedReads_100bp.fasta \ -o /home/cgarcia/analysis/mapping/graphmap2/corrected2/CBAS_MASURCA-2_final.genome.scf._ONT_cdna_graphmap2_combined_corrected-2_100bp.sam

Convert sam to bam and sort

samtools view -@ 80 -b -S CBAS_MASURCA-2_final.genome.scf._ONT_cdna_graphmap2_combined_corrected-2_100bp.sam | samtools sort -@ 80 \

CBAS_MASURCA-2_final.genome.scf._ONT_cdna_graphmap2_combined_corrected-2_100bp.sorted.bam

The line it complains about is this one (in the sam):

33ef8047-1b3c-4389-aad6-9110103981f5 0 scf7180000042097 35520 0 4=3D3=1X1=44N2D1=1X6=1X3=1I1=1I4=23N1X1=1X1=1X5=1D1=1X4=7D4=1D3I33N1I2=2D1=3D5=2D3=2X2=2I3=1D4=1X2=3I1=1I1=1X2=1D5=1D1=1D1=1D1=1X1=1X1=1X81N1=1I1=1X2=1X3=1X2=2D1X1=1X2=1X1=1X2=1X2=1I1=2D3=1X1=1X1=5D1X1=1X4=1D4=1X4=1I2=1I1=1X1=1X2=2D1X1=1X1=1D1X2=1X6=2D3=48N3=1X1=1X7=1I3=2X1=1X1=1D2=1X1=3X3=1X2=1I1=1X2=148N5=1X6=1D1=2X2=3D3=2D4=133N2=1X2=1D3=3D1=2D2=1X3=2X4=1D1=1X5=214N3=1X6=1I6=1D1=1D2=1X1=3X5=1D2=1X4=1X2=3I1=1X4=2D1=2X5=172N2I1=1X1=1I2=5I1=1X6=1X1=2D1=1D5=1D2=1X3=1X1=17N1=1X4=6D5=3D1X3=4D6=443N 0 0 TTTTTTTTTTTTTTTTTTTTTTTCTTTTTTTTTTTTCACCGTGAAACAAAAACAGATTCAGCTATTCAGAAAATAACATGTATACCTAAATTTTATATTTCTATGCAGTTGCTATGCTAATACTACAGCTACTAAACAATATATGTTTTTACACAAAATTCTGTGATCATTCCAGCTTGCTTAGAATAACCTTCTCCAGTTTGACTGTGTCAGCTGCAAATTTGCAGTTCCTTCAGCTAGTTTCTCAGTGGCCATTTGGTCCTCATTTAGCCCACCTGGTGCTTCTCGTTGAGAGATATTTTCTCAATATCAAGTGCCTTTGCTTTATCAGCATCAAATGCTTAGTAACTGATTCAGTTGATTCTTTAGCTTATCCAGAAGGGCAGGGGGAGATAGTAAGGTTATCACAACCCAGCCAATGCAGATTTGGATGACACTAAAGATCCCAGCGAATCGTACTAACATTTGGATAACGATTGAAAGCAAACGTGTCAGGAAATATCTCTCGGCGGAGAAGCAATTCAGGTGGCTGCTTAAATGAAAGGACAAATGGCCGCGCTGAGAAACAACTGAAGAAATCCACCAAGATTTGCTGACAGTGAAGCTGAGAAAATTATTCTAAGCAATAGAATAACATACTCCAGTTTGACCTTGTCTGGCTACTGGTACGGATTCCTTCATTAGTTTCTCGGTGGCCATTTGTGGTCACTCATTTAGCAGCCACTGAATTTGCTTCTCGTTGAGAGATATTTTCTCAATATCAAGTGCCTTTGCTTTATCAGCATCCAAATGTATTAACTGGTTCAGTTGATTCTTTTGAAACTTATCCAGAAAGGGGCAGGAGATAGTGAGTTATCACAACCAGCCAGTACCACCCAAAATTGCCCTGTATTACGGATGCTCCACATGACAATGGTTTTATAGTCAAGCTTCTTGTAATAATTATATGTTTTGGTGACTGATTGAACACCGGGGTCTTCCGGTTGGCTCAAAAGTTTTCTTATCAGTGTTCTTGACATACCAGATCATAGATACGACCAACAAATGGGAAATCAAGTGACTTGTGGCACATGAAACTGCCCCGCAGGCAAACAAAAGCCAAGAAGTCAGATTATGATGGATACCATATTCGCTCTCCAGAATCTGGCTGCTTCATGCCCTCCCATGTTGAACTAAGTTGATTAGTACTCTCTCCTTGCTGATACCTGCCTTTTCATATAGTTCAATAAACCTCTTTGCTTCAGCAATAGGTTCTTCTTTATCAAGCCAGTAACCTTGCATCCGCTAGAGCCCGACCGAGTGCTATCTTCCAAGTCTACAACCCCAAAGTTTACATGGCTTGTCAATAGCAGCGTTACTTGATCGTCAAACTGCCACCCAGCGTCTTTGGCAAACTTAATAGCATCGCAGCAAATGCTGATATTCCGGCATCTGAGCGGCCTGGCCTGGCAAGGATGGGTTAGTAGTTGCATCCGTTGGCTTGTATTGATCAATGGATTTGATATCTCCAGTGTCAGCGACTACCGTAGTAAATTTATTCAATTGGTCAAGCGCTTTCCATAATCTTGCAAGGTGTACGCTTGAAAAGATGTATTGACCTTAATAGGCACTGGAGCGGTCAACACAAAGTTAGATAACGAGAATCAAGGCAGGAATGCAATTTTACCAAAATCATACATCCCCGTAATATATTTCAACAAGAAACACAAAAGAAGACACAAACAACTTTATTGTT MD:Z:4^TTG3T1 NM:i:347 AS:i:-1134 H0:i:0 ZE:f:0 ZF:f:0 ZQ:i:1735 ZR:i:93581

Can you please help me to figure out what went wrong? I specified to include extended CIGARs with

--extcigar

thanks Michael

jmaricb commented 5 years ago

Hi @michieitel ,

is there a way to send us the data you used so we can try to recreate this issue on our side?

Thanks

ghelman91 commented 5 years ago

Hi @jmaricb - I am currently having the same issue when trying to convert the sam output to a bam. Was there any luck in finding what the issue was?

Cheers, Guy

jmaricb commented 5 years ago

Hi @ghelman91 ,

I am currently looking into it. What dataset and reference were you using?

Thanks

ghelman91 commented 5 years ago

Hi @jmaricb - I was trying to covert the sam of aligned nanopore cDNA reads mapped to the human reference genome to a bam file, similar to @michieitel above. The alignment runs fine but converting the output to a bam is the problem:

This is the command and output: samtools view -b 191007_testNGMLR/191008_graphmap.sam > 191008_graphmap.bam [E::sam_parse1] CIGAR and query sequence are of different length [W::sam_read1] Parse error at line 6305 [main_samview] truncated file.

I had also used the extend cigar option just to see if I could figure out what was going on but no luck. Here is my problem line: 5f4df548-7bb6-427c-80ab-2e05238df0fc 16 chr11 6615448 0 13S9=2D14=1I3=1X21=2X1=2X10=1X15=1D24=1I9=3I442N12=1I42=1I16=586N11=1X1I68=1D60=1I55=1I3=115N1=1I1=1I19=1X1I36=6D16=1I2=1X7=1I31=1X22=1X13=2D19=147N17=1D12=1D13=8D4=1I53=1D18=197N2X5=1X55=1D28=1I4=1D6=3D10=1X1D20=1D12=999N14=1X1=2X1I23=1D1=1X11=1D18=1D5=2X1I11=2I2=1X8=1I15=1D4=1I5=1X10=2I280N1I1D5=1X15=2D1=1X12=1I9=1D17=1X2=1D3=116N36= 0 0 GGTTACGTATTGCTGAAGGCCGGGAACACATTGCTGAAGCCACCACCACTGATATAGTCAACAATTTCATTTGTGATGAGGAAAGGTTCCTGGAAGGATGCCTCCCACTGTGGTGACATAGGGCTGAGGCAGTGTGGGAAGGTAGGGATGGAACTGGTGTCTTCCCCCAGAGACAGACCAACACCTGGCCCCAGCCTCGTCACCAGGGCTACTAGACCCAGGTGAGATGTTGGCACCAGCACTCATCAGGTACTGCACACCCAGACTGGCCTCAATCCCGGCCCGGCCCCGGCCCTGTTGTCCAACCACACGGGCTACTGATGCCCGATGTGCAAAGTTGCCACTGAAGAAAGTGCATGAACTGAGCCAGGTCTGAGTCATGGAAATACTGCTCCAGGAACTGGGCACAGGCTTGGCTGTTATTGCTGGTGCCAGAGCCCACGTCTTGTGAGGTCGTTAGCCGCTTACGGATCACAGAGGGGGCTGGCTTCCCAGATGCAGGCCTACAGTCCCTGTCACCTGCGGCTCAGACGTTGCCTCAGGGATGATGTTGGGGAAATCGGTGCAGTCCCCCCACAAAGTCCACATGGGGGCCAAGGCCTGTGGAAGCTGGTAGAGGGGGACCTTACAACATGGGTTTCCTCTGTAGGTCCTCCCACACATAGTGATGAAATCAGCCCCAGGAGCAGCAGCTCTGCTTGTCGATGCTCAGCCAGCAAGTCAGAAAGTCCTGTGTGATCACAGAACACTTCTGGGCTCCTCGGGCTGCCAAGAGCCATTTTTGCACCGTGTGGAGGGTTTGCAGGGGATGGCCTCACCAGATCAGCCACATTCTCTAGGGTCAGGTATTTCCTCCTGCATTGAAAGAGAGCTGGGATCCGACACAGCCTGCACCAGCTCCGAGAGTCTTTCCACATTCTGCTGTCCCAGGGCAAAAGGTGCAGAGACTCAGTCCTTTCTCAGGGTCCTAATGGCCCAGGGATCTGTAGCCTGGGGCAGCGTCCTCCGCTGGTCGGCTCCGGGCTGTAACTGCATTTGCCAGAGAGATGAGGGCAAAAGAGCCCCTAGGAGGCAGGCTTGGAGTCCCATTCTGCCCTTCCGCGGGATCTGTGAAGGCCGGGAACACATTGCTGGAAGTCACCACCACTGATATAGTCAAATACCTCATTTGTGACGAGGAAAGGTTCCTGAAGGATGTGCCTCCCACTGTGGTGGACATAGGGGCTGCTGGAGGCAGGGGAAGGTAGGGCGGAACTGGTGTCTTCCAGAGACAGACCAACACTCCGGCCCCACTGTCACCAGGGCTACTGAGAGACCCAGGTGGAGATGTTGGCACCAGCACTCATCAGGTACTGCACATCTAGACTGGCCTCAATCCCGCCCGGCCCCGGCCCTGTTGTCCAACCACACGGGCTACTGATGCCTGATGTGCAAAGTTGCTCACCGAAGAGGCGCATGAACTGAGCCAGGTCTGAGTCATGGAAATACTGCTCCAGTGAACTTCGGGCACAGGCTTGGCTGTTGGTTGCTGGTGCCAGAGCCCACGTCTTGTGAGGTCAAGCGCTTACGGATCACAGCAGAGGGTTACTCCCCAGATGCAGGCCTACAGTCCCTGTCACCCGCGGCTCAGGACGTTGCCTCAGTGATGATGTTGGGGAAACGGTGCAGTCCCCCCACAAAGTCCACATGGGGGCCAAGGCCTGTGAAGCTGGTAGGGACCTTTACAACATGGGTTTCCGTAGGTCCTCCCACATAGTGATGAAACTCAGCCCCAGGAGCAGCAGCTCTGCTTGTTTGATGCCCAGCCAGCAAGTCAGAAAGTCCTGTGTGATCACAGAATGGCACTTCTGGGCTCCGCTGCCAAGAGCCATTTTTGCACCGTGTGTGAGGTCAGTGATGGCCTCACTGATCAGCCACATTCTCTAGGTCAGGTATTTTCCGTATTGAGGAGAGTTTTCGATCCGACACAGCCTGCACCAGCCTGAGAGTCTTTCACATTCTGCTGTCTCAGGCAAAGCCAGAGACTCAGCTTTCTCCCTCAGGGGTCCGCACGGCCCAGGACACTCCAGCTTGGGGGCAGCGTCTCCTCTGCTGGTCGGGCTCCGCCGTAACTGCATTTTGCCAGAGAGATGAGGGCAAAGAGCCCCAGAGGCAGGCTTGGAGTCCCATTCTGCCCTTCCGCGGATCT %$(()&#((./48,:>=:899,55C<<57767%6/-,,459000599<2@<;=A@@581.'''').:;DEEDCABC@;3IDB?>:48:8564:33=.%1E211,)((%,<+(5120,+-(2,30(,$%$&>;>B88,-++%$$0';;603(-.8<4100/25<>@><<=;:2/;;AAB2%<<EF322)-*$&':A:2769<AC992(,D:?<?E@B97-4257&%?7'(26A@AA=*./24/2/((..-''$$'((04./&:=@/0;>,444?F7@12B;>8809B00.-4>CGA>>?(C@?A1362272>7;A4-,A;:9060::,,.47117(94/*)''$$#%%*4<=<0@/9869<8;;;:=B6985=3CFF>80=,,--7=:DECDG@A=B;514+0-5303::6CD@5B:?BA;9,)+(-&&&33;:971//=98.-&&&%%%$$%0.-052795;=>88:98;%1..)(9-,,D06<.,++4:82/5,(,217@82200500-$*'2&*--+-+($$$8;77?A8:6,+878576($#('%&)0++42-.*,1)=CE6;0/.**/-,55663?>=A./.1:4>;-&7250(-(,>>;:=.)&(++-453=A@.(#''*(,08@;@;93632/',&&&(6=C>@@BC8&7>86/-5:@4,HF>A353@:9.@@7402%/?62;AA?E<9622685660.-3,2***'(<))*'%$&%#'/-)'&535.+#&&#'*/+-.03545788-006<B311+:)&'.+.;(>9A@**)&$',8:))0/.//$,:;A?>>;8>1'')+&$($+,0:<6779966;-+%%+','((),,%$''94:/009&%9427%328?=>877:@667;B(:<>1=?>79/22752))66:651.,-(,6;:<=;)98754%%/%/076;:674&'76+'')41%346-+&$##,(''''%1+'%%%%)/54258782=..485?==742,-)'.2,+&//673+/.).+6&&''12/33289+('''(,0201'689:6722'&''4%589@>@?81.<$6;:E=<;<-%&%.04193217>G;::.--11%//40+AB;D9>;42:;06%%&%$24)194@=6/21939*302,)(&*&()++587))0&%,17/B)%,*.-7;/.,:@<:>$:;43=4268227896:008874(125&9$$-9$;==;8:A>;895558@@@88??648588>??:EC;@<<>>;98=>;89%,-*)47BDBC=;=>:.,<7892+)**)&$)0)+;=?>?:06C>???97?>9?:=A744(+./13/34&&,-.(,%&%&%'/&&#%''+(,0/.24&1-.5;<:033C@>34))&$68=@4/@=6>@,@?6434;7>BDFC9@197'%(&%%$1''&&5373ABB6:'035?8E:==BB3/.32$-):9A>=<>B@?=;;3852&.)+&(65:2$$$22387<A@A91/9:794,+'%$#&%&+./0//,79<=,++87<CC?9;?,-+2:332%%&-//-5++21-2-'%--,871//0&233568888846951/$$'#,249++(,2('(+,-)%(')$$53%$%'-%%$&&;'-==00))+,,2:;61/,,,00+6787989681/%,-.$-10420/0+589;;4001?=CBA<=9+5)),3---'.6&&)032)'$'(&'$%&4470280-1//097==47<;27:=388'&-6%:&+A<?BA?544'&.,1'('='.21./5-/4@9<CMGDG5@5AA@9A@/>;C;<=;=07;>@:9:?9-%%6:<6+/',-&&/>:86895865::4,',01/.0''(%9(3728))):(087;4/59.&+,68129979:@?9B99/7>16162,,6<64,''-/22>782%($'**)-(.768'/19<;?=431746A?5&&834/,(((102$/054370,62>>80/-+')&$#%#$('+-+++&%%'%#.&&%32/001/2$&')'(&1--5;4443,&--31+$$%%01339(&-4'',+-((+..,37;@B/(,%/'%)+3))3>C>99.0/)7)8/=+2>>?>=/GBA?>@?@'%1B6AA=:;=223@0A@C@/..84:41<9=2,,++8?110/1 MD:Z:9^TT17T21T0T1T0T10T15^T33 NM:i:92 AS:i:4535 H0:i:0 ZE:f:0 ZF:f:0 ZQ:i:2188 ZR:i:135086622

I might be able to directly send files if that would be helpful.

Cheers, Guy

HegedusB commented 5 years ago

The same problem here! "CIGAR and query sequence are of different length" (graphmap2 (v.0.6.01))

[bhegedus@node2 map_all_03]$ samtools view -Sb -@60 full_length_barcode01_05_scaffold_1.sam > full_length_barcode01_05_scaffold_1.bam [E::sam_parse1] CIGAR and query sequence are of different length [W::sam_read1] Parse error at line 13867 [main_samview] truncated file.

jmaricb commented 5 years ago

@ghelman91 @michieitel @HegedusB Could you try with the newest commit? I have fixed the cigar length bug.

menickname commented 4 years ago

dear @jmaricb , I currently encounter the same issue with ONT reads using the Graphmap2 0.6.4. release. I do not succeed in converting my .sam graphmap2 output into sorted .bam files using samtools sort nor the comments described above.

Thank you in advance.

jmaricb commented 4 years ago

@menickname

Can you share the dataset and the reference you are using and still getting different CIGAR and query lengths? I would like to run it and try to find the error.

Thank you.

menickname commented 4 years ago

Dear Josip

Thank you for your quick response. Can you tell me how to share this dataset the best with you? (preferably contained between us).

Thank you in advance. Best regards, Nick Vereecke

jmaricb commented 4 years ago

@menickname Well if you can, can you just send me one or several reads that have wrong CIGAR length? You don't need to send me the whole dataset. Just tell me which reference are you using then.

menickname commented 4 years ago

DearJosip

Thank you for your quick response. Below an example of the filtered/trimmed ONT reads. The used reference genome can be found at https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2 .

@fd5bbd49-dcc4-492b-b19c-e11e853e33a5 runid=674eb540e8d54b342a1e172ff5484ba559454959 sampleid=200327_Covid-19_Artic_Protocol_restart read=1319 ch=195 start_time=2020-03-27T16:38:58Z AATTACCTGAAACTTACTTTACTCAGGTAAAAATTTACAAGAATTTCGCCCAGGAGTCAAATGGAAATTGATTTCTTAGAATTAGCTATGGATGAATTCATTGAACGGTATAGAAGGCTATGCCCAACTGAACATATCTTTATGGAGGTGCGTCATAGTCAGTTAGGTGGTTTTACATCTACTGATTGGACTAGCTAAACGTTTAAGGAATCACCTTTTGAATTAGAAGATTTTATTCCTATGGACAGTACAGTTAAAAACTATTTCATAACAGATGCACCAAACAGGTTCATCTAAGTGTGTGTTCTGTTATTGATTTATTACTTGATGATTTTGTTGTTGGAAATAATAAAATCCCAAGATTATCTGTAGTTTCTAAGGTTGAG + 4648*;;309<;8:7)@GGE?@7693.712/21@DEB;./:1334/#$.0143*-51ABCA<<1.4&&-9:9KJ>?7A000;BF?@EB?A>@)****,5=?;,,/4+01/**)5,@=?7.-)%%%%%)3(0362,+&$)38?@A;(&&&#$&4:A>@92?<;<1$1&)).37;53>C=A=;@DJDG>6A??573<>BFH:E6@F:7)FHA9?>5558?:4783;=:9766FHJFFKC>?DF@ABB43>790/5::?@<8455>?DC)>CAF>E?6.)$$')357/.0<GCD746867=?.64FHDEAC>6;335><;=6443'035;69A=>HJM4++&<<()A@C?@:979D><?DB23<0.0650<@@AGHCG>:9:@FIEA?= @2efc38c4-3d9c-491f-8772-db3fdeb8749a runid=674eb540e8d54b342a1e172ff5484ba559454959 sampleid=200327_Covid-19_Artic_Protocol_restart read=1324 ch=150 start_time=2020-03-27T16:39:01Z GGTCAATTTCTGTACAAACAACACCATCCAATTTATAGAAGTAACTGGTTTATGGTTGTTGTGTAACTGTTTTCTTTGTAGAAAACATCAATAGGACCTTTGTATTCTGAGGGACTTTGTAAGTAAAGCACCGTCTATGCAATACAAAGTTTCTTTAGAAGTTATATGTTTATAGTGACCACACTGGTAATTACCAGTGTACTCCTGCTTACAAATGTACCATGTAGAAAGTTCATACTGAGCAGGTGGTGCTGACATCATAACAAAGGTGACTCCTGTTGTACTAGATATTTTTGTAGCTTGTTTACCACACGTACAAGGTATCTGAACACCTTTCTTAAATTGTTCATAAGAGAAGTGTGCCCATGTATAACAGCAG + *(&(13.5DB@>=7:03*/224768>89?CCCG;6,9%''1:4<>@?@ID/+++.7.2<+,62))%$&%/0013;;?6444EEHD>:3?./245;?;=8>F;>BA:9?:?A<;:389?8822+:@B3:75)*)11*566@@@CA=898=>DE9A@2,+(%3)2=9997CAD?>>CEFB<>=(9559@<@C=?035:.753666&-*(%%'/3=@B@B<;45>;/)-%&&''8CE>==AH<,655=5D8966)38>==77)9;;'670746<;87-23244=><94.1710-9=A?9/5,7766680,12/.32-+,//.4,.+(3+2.3050023655/,---6:AECB?++6:*++3>?G<;;96986>>>=AFD@C@ @1d9bbe11-c3e5-4d6e-ae33-2883200d8816 runid=674eb540e8d54b342a1e172ff5484ba559454959 sampleid=200327_Covid-19_Artic_Protocol_restart read=1103 ch=454 start_time=2020-03-27T16:38:59Z TAGCGGCGTGCCTTTATGGCACGAAACACAGTGGTACGAACTTATGTACTCATTCGTTTCCGGAAAGACAGGCACGTTATTAGTTAATAGCGTACTTCTTTTTCTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTGCGCTTCGATTGTGTGCGTACTGCTGCAATATTGTTAACGTGAGTCTTGTAAAACCTTCTTTTTACGTTTACTCTCGTGTTAAAATCTAATTCTTCTAGAGTTCCTGATCTTCTGGTCTAAACGAACTAAATATTATAGTTTTCTGTTTGGAACTTTAATTTTATATATGGTTTAACAGTACTATTACCGTTGAAGAGCTAAGCTCCTTGAACAATGGAACCTAGTAG + *+*%&'&.1)/--.)%#%(%$$$$$#%%%%%42.<<=...++,?<<75/67&&?@=IED$?03/.$&*%'('%,34540'*-=8AA800G9C;<AEE@JGFGH;><,''):52''-77679>=8;=<B;.5&4589<99>22637/+1.3323832/993:57=69?<=GGFLEC;93323985@B90:19;5A@>:6;=H=162;;=>BG.+58DEGCCG.?7<;>7:>@7:58.'*')/79:<@>5HFI26C?;79?>@D<82A<ADD;''*996774439;9<:.1?@3:6264&8DAAKK9CDC>8)&#$#$%%((0%'(*+2--4(0***+,0;973)%0$#&..&976:DBCD@?EC>;;9=:;&,*080 @c46650e5-daa0-4235-8b9b-ce49c9966837 runid=674eb540e8d54b342a1e172ff5484ba559454959 sampleid=200327_Covid-19_Artic_Protocol_restart read=973 ch=467 start_time=2020-03-27T16:39:02Z AAGGTAAGAACAAGTCCTGAGTTGAATGTAAAACTGAGGATCTGAAAACTGTCAGAATTAATAAACACCACGTGTGAAAGAATTAGTGTATGCAGGGGTAATTGAGTTCTGGTTGTAAGATTAACACACTGACTAGAGACTAGTGGCAATAAAACAAGAAAAACAAACATTGTTCGTTTAGAGAACAGATCTACAAGAGATCGAAAGTTGGTTGGTA + ('33):)AABA?@@++--,.2JJCHIABB@=;<4114?;>8??))BM-(=3*.:())*-327;?;8ADAD=?B?**))'+-,-;35;<;A:978ADA;E=BCDC??G<@DCC:<>@BC=:8D@@8B/@?GID><@7DMA>;::>;:<EECFGC?<A@0LHO2+.4)&(=<GDGD>>HC-()(3=D?B;;5=5'.:;'1/.,686(((3E7:;;7=?D @e4af2786-708a-46ac-9687-b4484e6c68f7 runid=674eb540e8d54b342a1e172ff5484ba559454959 sampleid=200327_Covid-19_Artic_Protocol_restart read=1003 ch=476 start_time=2020-03-27T16:39:01Z ATGTCTTGTGCTGCCGGTACTACACACAAACTGCTTGCACTGATGACAATGCGTTAGCTTACTACAACACAACAAAGGGAGGTAGGTTTGTACTTGCACTGTTATCGATTTACAGGATTTGAAATGAAACAGATTCCCTAAGAGTGATGGAACTGGTACTATCTATGCAGAACTGGAACCACCTTGTAGGTTTGTTACGAACAATGCTTTTTCACTATGTAGAAAGTTGGATAATGATGCACTCAACAACATTATCAACAACAATGCAGAGATGGTTTATTCGCCCAGACTTTCAACCATCTCGCTGCATTATTGATAACTCATTGAGATGCATCGTGCTTCCAGCAGCATAGTGAAAAGCATTGTCTGTAACAAAAACCTACGAGGTTCCCGGTGCTGTATGGATGAGTACCATTCCATCGCTCTTGGGAATCTGGCCATTTCAAATCTGTAAATCGGATAACGTGCGTGACCTACTCCTTTTTGTTGTGTTGGCAATGGCTGGCGCATTGTCTCATCGGTACTGGCGCGATTTGTGACGCACCGGCACAGACAG + ::<??>3-80CG;@?;?>:=B<&3,.15;CFGB==<((47?;67D?>660/46<+*(*+-*,+(/77444((-::,B>>C==2>4@GD@355///%"%+,&92-)-906961008.,5?4:264+*%$#$'0''(/4$%')*$309:=<CF=8:3/))--*32/.&&5847157<?@8132558:.((:9EEB=?:..,./42300:<=;=98+)08,'$%%(((719483-)//;;>0('&27---..)()*11:()+,,*..62+++9<,%$)22*$$&,$$$$#$'$$$$)()+,*)('&$$1%&&%&$$)&%''((*(%$)3+,.%%126.%(#"#$)+%''+%&%$$33200.())())-)0-&&&&+-'%(%()$$$%#.+%0-34''($$('*,,,'*&&#**(+*))+54*+/&'+353/35=833++-+/++-23770))%%%*+)*((*&$&)/,.-(+'54//)(&+-0*0111%*+.072/*''**))'$)#%*****+&&&'*1/0$&(&$%&%##&'%'))&*$'&''--,+/..).'-$#&

I did use following command: for READS in $(ls nanofilt*.fastq); do graphmap2 align -r ../part3_Reference_mapping/Covid-19_Ref_genome.fasta -d $READS -o ../part3_Reference_mapping/graphmap2_$READS.sam -t 36; done The .sam file is created without any issue and seems a normal .sam file to mee. However when I want to proceed further and do further sorting and indexing, I do not get correct .bam files (only one line present in the file) and empty .bai files. for samfiles in $(ls graphmap2_*.sam); do samtools sort $samfiles > $samfiles.bam samtools index $samfiles.bam; done

I have used the same dataset with the Graphmap pipeline and this did not give any problems in further sorting and indexing. The Graphmap2 issue appeared both on our local computer as well as our HPC infrastructure.

Thank you in advance. Best regards, Nick Vereecke

jmaricb commented 4 years ago

Are you aligning RNA reads or DNA? If you are aligning RNA reads you should use '-x rnaseq' option. If you are aligning DNA reads then actually there is no difference between Graphmap and Graphmap2, so you could continue using Graphmap.

I will take a look anyway.

menickname commented 4 years ago

@jmaricb we are aligning DNA sequences. We like to work with the most-up-to-date versions of the available software. Please keep me posted if the issue got solved, then I could try to run it with Graphmap2 as well.

jmaricb commented 4 years ago

I will let you know. It should work with Graphmap2 too. It's just that Graphmap2 has only updates for RNA reads so it should be the same regarding DNA reads.

1053286838 commented 3 years ago

Hi @jmaricb - I was trying to covert the sam of aligned nanopore cDNA reads mapped to the human reference genome to a bam file, similar to @michieitel above. The alignment runs fine but converting the output to a bam is the problem:

This is the command and output: samtools view -b 191007_testNGMLR/191008_graphmap.sam > 191008_graphmap.bam [E::sam_parse1] CIGAR and query sequence are of different length [W::sam_read1] Parse error at line 6305 [main_samview] truncated file.

I had also used the extend cigar option just to see if I could figure out what was going on but no luck. Here is my problem line: 5f4df548-7bb6-427c-80ab-2e05238df0fc 16 chr11 6615448 0 13S9=2D14=1I3=1X21=2X1=2X10=1X15=1D24=1I9=3I442N12=1I42=1I16=586N11=1X1I68=1D60=1I55=1I3=115N1=1I1=1I19=1X1I36=6D16=1I2=1X7=1I31=1X22=1X13=2D19=147N17=1D12=1D13=8D4=1I53=1D18=197N2X5=1X55=1D28=1I4=1D6=3D10=1X1D20=1D12=999N14=1X1=2X1I23=1D1=1X11=1D18=1D5=2X1I11=2I2=1X8=1I15=1D4=1I5=1X10=2I280N1I1D5=1X15=2D1=1X12=1I9=1D17=1X2=1D3=116N36= 0 0 GGTTACGTATTGCTGAAGGCCGGGAACACATTGCTGAAGCCACCACCACTGATATAGTCAACAATTTCATTTGTGATGAGGAAAGGTTCCTGGAAGGATGCCTCCCACTGTGGTGACATAGGGCTGAGGCAGTGTGGGAAGGTAGGGATGGAACTGGTGTCTTCCCCCAGAGACAGACCAACACCTGGCCCCAGCCTCGTCACCAGGGCTACTAGACCCAGGTGAGATGTTGGCACCAGCACTCATCAGGTACTGCACACCCAGACTGGCCTCAATCCCGGCCCGGCCCCGGCCCTGTTGTCCAACCACACGGGCTACTGATGCCCGATGTGCAAAGTTGCCACTGAAGAAAGTGCATGAACTGAGCCAGGTCTGAGTCATGGAAATACTGCTCCAGGAACTGGGCACAGGCTTGGCTGTTATTGCTGGTGCCAGAGCCCACGTCTTGTGAGGTCGTTAGCCGCTTACGGATCACAGAGGGGGCTGGCTTCCCAGATGCAGGCCTACAGTCCCTGTCACCTGCGGCTCAGACGTTGCCTCAGGGATGATGTTGGGGAAATCGGTGCAGTCCCCCCACAAAGTCCACATGGGGGCCAAGGCCTGTGGAAGCTGGTAGAGGGGGACCTTACAACATGGGTTTCCTCTGTAGGTCCTCCCACACATAGTGATGAAATCAGCCCCAGGAGCAGCAGCTCTGCTTGTCGATGCTCAGCCAGCAAGTCAGAAAGTCCTGTGTGATCACAGAACACTTCTGGGCTCCTCGGGCTGCCAAGAGCCATTTTTGCACCGTGTGGAGGGTTTGCAGGGGATGGCCTCACCAGATCAGCCACATTCTCTAGGGTCAGGTATTTCCTCCTGCATTGAAAGAGAGCTGGGATCCGACACAGCCTGCACCAGCTCCGAGAGTCTTTCCACATTCTGCTGTCCCAGGGCAAAAGGTGCAGAGACTCAGTCCTTTCTCAGGGTCCTAATGGCCCAGGGATCTGTAGCCTGGGGCAGCGTCCTCCGCTGGTCGGCTCCGGGCTGTAACTGCATTTGCCAGAGAGATGAGGGCAAAAGAGCCCCTAGGAGGCAGGCTTGGAGTCCCATTCTGCCCTTCCGCGGGATCTGTGAAGGCCGGGAACACATTGCTGGAAGTCACCACCACTGATATAGTCAAATACCTCATTTGTGACGAGGAAAGGTTCCTGAAGGATGTGCCTCCCACTGTGGTGGACATAGGGGCTGCTGGAGGCAGGGGAAGGTAGGGCGGAACTGGTGTCTTCCAGAGACAGACCAACACTCCGGCCCCACTGTCACCAGGGCTACTGAGAGACCCAGGTGGAGATGTTGGCACCAGCACTCATCAGGTACTGCACATCTAGACTGGCCTCAATCCCGCCCGGCCCCGGCCCTGTTGTCCAACCACACGGGCTACTGATGCCTGATGTGCAAAGTTGCTCACCGAAGAGGCGCATGAACTGAGCCAGGTCTGAGTCATGGAAATACTGCTCCAGTGAACTTCGGGCACAGGCTTGGCTGTTGGTTGCTGGTGCCAGAGCCCACGTCTTGTGAGGTCAAGCGCTTACGGATCACAGCAGAGGGTTACTCCCCAGATGCAGGCCTACAGTCCCTGTCACCCGCGGCTCAGGACGTTGCCTCAGTGATGATGTTGGGGAAACGGTGCAGTCCCCCCACAAAGTCCACATGGGGGCCAAGGCCTGTGAAGCTGGTAGGGACCTTTACAACATGGGTTTCCGTAGGTCCTCCCACATAGTGATGAAACTCAGCCCCAGGAGCAGCAGCTCTGCTTGTTTGATGCCCAGCCAGCAAGTCAGAAAGTCCTGTGTGATCACAGAATGGCACTTCTGGGCTCCGCTGCCAAGAGCCATTTTTGCACCGTGTGTGAGGTCAGTGATGGCCTCACTGATCAGCCACATTCTCTAGGTCAGGTATTTTCCGTATTGAGGAGAGTTTTCGATCCGACACAGCCTGCACCAGCCTGAGAGTCTTTCACATTCTGCTGTCTCAGGCAAAGCCAGAGACTCAGCTTTCTCCCTCAGGGGTCCGCACGGCCCAGGACACTCCAGCTTGGGGGCAGCGTCTCCTCTGCTGGTCGGGCTCCGCCGTAACTGCATTTTGCCAGAGAGATGAGGGCAAAGAGCCCCAGAGGCAGGCTTGGAGTCCCATTCTGCCCTTCCGCGGATCT %$(()&#((./48,:>=:899,55C<<57767%6/-,,459000599<2@<;=A@@581.'''').:;DEEDCABC@;3IDB?>:48:8564:33=.%1E211,)((%,+(5120,+-_(2,30(,$%$&<>;>B88,-++%$$0';;603(-.8<4100/25<>@><<=;:2/;;AAB2%<<EF322)-_$&':A:2769<AC992(,D:?/0;,444?;>8809B00.-4>CGA>>?(C@?A1362272>7;A4-,A;:9060:<:,,.47117(94/)''$$#%%4=<0@/9869<8;;;:=B6985=3CFF>80=,,--7=:DECDG@A=B;514+0-5303::6CD@5B:?BA;9,)+(-&&&33;:971//=98.-&&&%%%$$%0.-052795;=>88:98;%1..)(9-,,D06.,++4:82/5,*(,<217@82200500-$*'2&*--+-+($$$8;77?A8:6,+878576($#('%&)0++42-.*,1)=CE6;0/.**/-,55663?>=A./.1:4>;-&7250(-(,>>;:=.)&(++-453=A@.(#''_(,08@;@;93632/',&&&(6=C>@@BC8&7>86/-5:@4,HF>A353@:9.@@7402%/?62;AA?E<9622685660.-3,2_**'(<))_'%$&%#'/-)'&535.+#&&#'_/+-.03545788-006B311+:)&'.+.;(9A@**)&$',8:))0/.//$,:;A?>>;8>1'')+&$($+,0:<6779966;-+%%+','((),,%$''94:/009&%9427%328?=>877:@667;B(:<>1=?>79/22752))66:651.,-(,6;:<=;)98754%%/%/076;:674&'76+'')41%346-+&$##,(''''%1+'%%%%)/54258782=..485?==742,-)'.2,+&//673+/.).+6&&''12/33289+(''_'(,0201'689:6722'&''4%589@>@?81.$6;:E=<;<-%&%.04193217>G;::.--_11%//40+AB;;42:;06%%&%$24)194@=6/21939_302,)(&_&()++587))0&%,17/B)%,_.-7;/.,:@<:>$:;43=4268227896:008874(125&9$$-9$;==;8:A>;895558@@@88??648588>??:EC;@<<>>;98=>;89%,-_)47BDBC=;=>:.,<7892+)**)&$)0)+;=?>?:06C>???97?>9?:=A744(+./13/34&&,-.(,%&%&%'/&&#%''+(,0/.24&1-.5;<:033C@>34))&$68=@4/@=6>@,@?6434;7>BDFC9@197'%(&%%$*1''&&5373ABB6:'035?8E:==BB3/.32$-):9A=>B@?=;;3852&.)+&(65:2$$$22387<A@A91/9:794,+'%$#&%&+./0//,79<=,++87<CC?9;?,-+2:332%%&-//-5++21-2-'%--,871//0&233568888846951/$$'#,249++(,2('(+,-)%(')$$53%$%'-%%$&&;'-==00))+,,2:;61/,,,00+6787989681/%,-.$-10420/0+589;;4001?=CBA<=9+**5)),3---'.6&&)032)'$'(&'$%&4470280-1//097==47<;27:=388'&-6%:&+A/<>;C;<=;=07;>@:9:?9-%%6:<6+/',-&&/>:86895865::4,',01/.0''(%9(3728))):(087;4/59.&+,68129979:@?9B99/7>16162,,664,''-/22>782*%($'***)-(.768'/19<;?=431746A?5&&834/,(((102$/054370,62>>80/-+')&$#%#$('+-++*+&%%'%#.&&%32/001/2*$&')'(&1--5;4443,&--31+$$%%01339(&*-4'',+-((+..,37;@B/(,%*/'%)+*3))3C>99.0/)7)8/=+2>>?>=/GBA?>@?@'%1B6AA=:;=223@0A@C@/..84:41<9=2,,++8?110/1 MD:Z:9^TT17T21T0T1T0T10T15^T33 NM:i:92 AS:i:4535 H0:i:0 ZE:f:0 ZF:f:0 ZQ:i:2188 ZR:i:135086622

I might be able to directly send files if that would be helpful.

Cheers, Guy

@HegedusB Do you solve the issue? I have the same problem: [E::sam_parse1] CIGAR and query sequence are of different length [W::sam_read1] Parse error at line 16323139 [main_samview] truncated file.