lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
473 stars 132 forks source link

SamFixCigar and GraphMap issues #74

Closed Tintest closed 7 years ago

Tintest commented 7 years ago

Subject of the issue

Hello Pierre ,

I want to use SamFixCigar on mapping data from MinION R9.4 datatsets (https://github.com/nanopore-wgs-consortium/NA12878), to get .sam files with a CIGAR notation close to the extended version. With files from a BWA-MEM mapping, no problem, but when I want to use data SamFixCigar on mapping data from GraphMap, I have multiple errors for a few seconds and then the tool stops.

Can you help me?

Regards.

Your environment

Actual behaviour

Here is 3 reads with errors.

Error parsing text SAM file. MAPQ must be zero if RNAME is not specified; File rel3-nanopore-wgs-3709921973-FAB39043_GraphMap.sam; Line 1324 Line: 0e70570b-9054-4b47-bea5-6736547b593f_Basecall_Alignment_template 4 0 255 0 0 ACCGGCCCTTAATTCGTGCATGTCCTTTCTCGGCGCTCGATGCAATTAGGCGAGGGTGGCGGTGAGAATAAACATCATTGTACTTTATTGATGCGGAAATATCTAAAATTATTGCCGAGCCGAGAAATTGGCTTGTGAAGCAATACTCGACTTGAAGTATTTCATCTTGGCGACATTGTCATTGGCCTTGGCAGTAGCGCTCATGATGTAACTCGGCGCCTCATCAAACTAAGCTGTAAGCTCAAGCTTTCGCTCTATCAGCTTATATCAGCTAGTAATGACTTCATGTCTCATAAGAGGCGAGAATATGTATGTCTGTCATGAAACCTCAACCGATGAAACATCAGCCATATCAAAGCTTGAAGTAAAAATGTACTCTTCTTCGCCAGGCAAATATTCGAAAATGTATTCACTTGATGATCATTAAAACCTTCGAAGCTGCGCGGTAGCCCTTGCTCTTGTGTATGGGGCTCGGCATGTCTGTACAACATGCTCAGTAAATTACTTTTATGGCCGTCTTTTTAGGGCCGTGAAATAAACATTTCGAAAACGGTATAGAATATTCGAAAGCTATTATTCAATAATCAACAATAAAATAACGTGGCATGAAATGTGCGTAGAAATATTATCAATATAAAGCTTGGCAAGCCCTCAAATATCAGTCTCAAAGGCCTAAAATGTCAAAGC "#&$%'(%%)(+'($$&&)$)$&##$))%+)$)''&%'&'&'))&#$&&'&%)$%%%%''(.-/)$#&&%(('+()$##%%$%&%&#%('&')+(,%&)0&-%)&(&-+(%$%%%&'')+'&&%#%%%$"##&'$''%%%&(+#$'%%#&(&$((&(('(((((&()%'$'%&)*'%'&,)'(+((),+$&$($'$#%&$'&&%(&')(&$'&&''()%&&%)$%%'(',)))'$%,,+(%%&'(+''&()%+)$'$$$$$$(+(&(+()(&((&$&&)+''))+(%$#%%"'#'&)&$&$%$#'%())(''%)-/+('$&#$&).)&%&)%%%&+-,')&,-&'+&))))(&(#&()(+-+)&'&,()%(&'%&',.(),)'$$&&&(%+,,&$)(*&+&&%$$&#$'&&'(&%%%'&#$#$%#$$#%(###$$%%$%%$&).*#&&%&#'%((.+-/,'%&"#'&(")%'(',%$&$"'($$"#&(%..-.')((((((&&$'&&((&#%&&$$$$$#%('&)--/.2.&$''+$$&%&((+-+++(++'')''-(&#',-)((%&%%&%)+,))$(%(#(%#'&&&+((%'(+)*))(%$&'$'$'+()0,'%(++++&%&"'#''),((&)%%%('''$)'&%'#" NM:i:-1 AS:i:-687 H0:i:0 ZE:f:inf ZF:f:0 ZQ:i:687 ZR:i:0

Ignoring SAM validation error due to lenient parsing: Error parsing text SAM file. Tag of type f should have single-precision floating point value; File rel3-nanopore-wgs-3709921973-FAB39043_GraphMap.sam; Line 1324 Line: 0e70570b-9054-4b47-bea5-6736547b593f_Basecall_Alignment_template 4 0 255 0 0 ACCGGCCCTTAATTCGTGCATGTCCTTTCTCGGCGCTCGATGCAATTAGGCGAGGGTGGCGGTGAGAATAAACATCATTGTACTTTATTGATGCGGAAATATCTAAAATTATTGCCGAGCCGAGAAATTGGCTTGTGAAGCAATACTCGACTTGAAGTATTTCATCTTGGCGACATTGTCATTGGCCTTGGCAGTAGCGCTCATGATGTAACTCGGCGCCTCATCAAACTAAGCTGTAAGCTCAAGCTTTCGCTCTATCAGCTTATATCAGCTAGTAATGACTTCATGTCTCATAAGAGGCGAGAATATGTATGTCTGTCATGAAACCTCAACCGATGAAACATCAGCCATATCAAAGCTTGAAGTAAAAATGTACTCTTCTTCGCCAGGCAAATATTCGAAAATGTATTCACTTGATGATCATTAAAACCTTCGAAGCTGCGCGGTAGCCCTTGCTCTTGTGTATGGGGCTCGGCATGTCTGTACAACATGCTCAGTAAATTACTTTTATGGCCGTCTTTTTAGGGCCGTGAAATAAACATTTCGAAAACGGTATAGAATATTCGAAAGCTATTATTCAATAATCAACAATAAAATAACGTGGCATGAAATGTGCGTAGAAATATTATCAATATAAAGCTTGGCAAGCCCTCAAATATCAGTCTCAAAGGCCTAAAATGTCAAAGC "#&$%'(%%)(+'($$&&)$)$&##$))%+)$)''&%'&'&'))&#$&&'&%)$%%%%''(.-/)$#&&%(('+()$##%%$%&%&#%('&')+(,%&)0&-%)&(&-+(%$%%%&'')+'&&%#%%%$"##&'$''%%%&(+#$'%%#&(&$((&(('(((((&()%'$'%&)*'%'&,)'(+((),+$&$($'$#%&$'&&%(&')(&$'&&''()%&&%)$%%'(',)))'$%,,+(%%&'(+''&()%+)$'$$$$$$(+(&(+()(&((&$&&)+''))+(%$#%%"'#'&)&$&$%$#'%())(''%)-/+('$&#$&).)&%&)%%%&+-,')&,-&'+&))))(&(#&()(+-+)&'&,()%(&'%&',.(),)'$$&&&(%+,,&$)(*&+&&%$$&#$'&&'(&%%%'&#$#$%#$$#%(###$$%%$%%$&).*#&&%&#'%((.+-/,'%&"#'&(")%'(',%$&$"'($$"#&(%..-.')((((((&&$'&&((&#%&&$$$$$#%('&)--/.2.&$''+$$&%&((+-+++(++'')''-(&#',-)((%&%%&%)+,))$(%(#(%#'&&&+((%'(+)*))(%$&'$'$'+()0,'%(++++&%&"'#''),((&)%%%('''$)'&%'#" NM:i:-1 AS:i:-687 H0:i:0 ZE:f:inf ZF:f:0 ZQ:i:687 ZR:i:0

Ignoring SAM validation error due to lenient parsing: Error parsing text SAM file. MAPQ should be 0 for unmapped read.; File rel3-nanopore-wgs-3709921973-FAB39043_GraphMap.sam; Line 1324 Line: 0e70570b-9054-4b47-bea5-6736547b593f_Basecall_Alignment_template 4 0 255 0 0 ACCGGCCCTTAATTCGTGCATGTCCTTTCTCGGCGCTCGATGCAATTAGGCGAGGGTGGCGGTGAGAATAAACATCATTGTACTTTATTGATGCGGAAATATCTAAAATTATTGCCGAGCCGAGAAATTGGCTTGTGAAGCAATACTCGACTTGAAGTATTTCATCTTGGCGACATTGTCATTGGCCTTGGCAGTAGCGCTCATGATGTAACTCGGCGCCTCATCAAACTAAGCTGTAAGCTCAAGCTTTCGCTCTATCAGCTTATATCAGCTAGTAATGACTTCATGTCTCATAAGAGGCGAGAATATGTATGTCTGTCATGAAACCTCAACCGATGAAACATCAGCCATATCAAAGCTTGAAGTAAAAATGTACTCTTCTTCGCCAGGCAAATATTCGAAAATGTATTCACTTGATGATCATTAAAACCTTCGAAGCTGCGCGGTAGCCCTTGCTCTTGTGTATGGGGCTCGGCATGTCTGTACAACATGCTCAGTAAATTACTTTTATGGCCGTCTTTTTAGGGCCGTGAAATAAACATTTCGAAAACGGTATAGAATATTCGAAAGCTATTATTCAATAATCAACAATAAAATAACGTGGCATGAAATGTGCGTAGAAATATTATCAATATAAAGCTTGGCAAGCCCTCAAATATCAGTCTCAAAGGCCTAAAATGTCAAAGC "#&$%'(%%)(+'($$&&)$)$&##$))%+)$)''&%'&'&'))&#$&&'&%)$%%%%''(.-/)$#&&%(('+()$##%%$%&%&#%('&')+(,%&)0&-%)&(&-+(%$%%%&'')+'&&%#%%%$"##&'$''%%%&(+#$'%%#&(&$((&(('(((((&()%'$'%&)*'%'&,)'(+((),+$&$($'$#%&$'&&%(&')(&$'&&''()%&&%)$%%'(',)))'$%,,+(%%&'(+''&()%+)$'$$$$$$(+(&(+()(&((&$&&)+''))+(%$#%%"'#'&)&$&$%$#'%())(''%)-/+('$&#$&).)&%&)%%%&+-,')&,-&'+&))))(&(#&()(+-+)&'&,()%(&'%&',.(),)'$$&&&(%+,,&$)(*&+&&%$$&#$'&&'(&%%%'&#$#$%#$$#%(###$$%%$%%$&).*#&&%&#'%((.+-/,'%&"#'&(")%'(',%$&$"'($$"#&(%..-.')((((((&&$'&&((&#%&&$$$$$#%('&)--/.2.&$''+$$&%&((+-+++(++'')''-(&#',-)((%&%%&%)+,))$(%(#(%#'&&&+((%'(+)*))(%$&'$'$'+()0,'%(++++&%&"'#''),((&)%%%('''$)'&%'#" NM:i:-1 AS:i:-687 H0:i:0 ZE:f:inf ZF:f:0 ZQ:i:687 ZR:i:0

lindenb commented 7 years ago

the problem comes from the metadata tag

ZE:f:inf

where, I suppose 'inf' == infinity while a SAM tag 'f' expects a floating number. It's an error with the BAM itself. To fix this problem , you should use sedto remove this kind of metadata and send an issue to https://github.com/nanopore-wgs-consortium/NA12878/issues

P.

nickloman commented 7 years ago

I think this is a GraphMap issue. Our BAM files are generated from BWA-MEM.

Tintest commented 7 years ago

Hello,

Thank you for your quick and accurate answers. Indeed, GraphMap seems to generate incorrect .sam files. Even after correcting this metadata flag, other errors appear. So I'll try to report my problems on the GraphMap GitHub. Nevertheless, can you perhaps advise me another method to “convert” standard CIGAR to extended CIGAR notation. In short, I’m trying to discriminate matched bases and mismatched from the “M” sign from a standard CIGAR string, for, for example, computing the percentage of identity in those datasets. Thank you, Regards.

lindenb commented 7 years ago

@Tintest

Nevertheless, can you perhaps advise me another method to “convert” standard CIGAR to extended CIGAR notation.

I don't know such tool. Have a look at gatk ? ask biostars.org ?

I'm closing this issue for now.