vtsyvina / CliqueSNV

MIT License
21 stars 5 forks source link

Padding operator not between real operators in CIGAR #14

Closed antoine4ucsd closed 2 years ago

antoine4ucsd commented 2 years ago

Hello Such an amazing tool (and great improvement from the first version). congrats! working great on HIV Illumina data. I am trying to get variants from SARS-CoV-2 data. Pretty large input sam files. I got this error for one of my sample. any chance you can help with this? suggestion to fix? filter out some specific reads (automatically)?

thank you!

ERROR: Read name M03251:179:000000000-K2J5L:1:1106:10161:20380_1:N:0:CAAGGTAC, Padding operator not between real operators in CIGAR

`` @HD VN:1.0 SO:coordinate @SQ SN:MN908947.3 LN:29903 @RG ID:c0906184-f6a9-4be3-9fb3-268bd391c8d0 PI:185 SM:MBN033222-4_S4_L001 PL:ILLUMINA @PG ID:0 VN:21.0 PN:clcgenomicswb

M03251:179:000000000-K2J5L:1:1106:10161:20380_1:N:0:CAAGGTAC 99 MN908947.3 28185 60 10M2D5M53D178M = 28251 280 CTTGTGGATCTGTTCTCTAAACGAACAAACTAAAATGTCTGATAATGGACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAGTAACCAGAATGGAGAACGCAGGGGGGCGCGATCAAAACAACGTCGGCCCCAAGGTTTACCCAATAATACTGCGTCTTGGTTC CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGFGGGGGGGGGGGGGGGGEGGGGGGGFGGFEGEC=DGCGBAGDEFGFGF5F5<>;56=CE>5>:?FFF48BF0<<:?746<>7),486A<<<2 MD:Z:5A1TG1^GT5^GTTCTATGAAGACTTTTTAGAGTATCATGACGTTCGTGTTGTTTTAGATTTCA117T60 RG:Z:c0906184-f6a9-4be3-9fb3-268bd391c8d0 NH:i:1 NM:i:59 M03251:179:000000000-K2J5L:1:1106:10161:20380_1:N:0:CAAGGTAC 147 MN908947.3 28251 60 1S8P6I3M1D188M22S = 28185 -280 GGATCTGTTCTCTAAACGAACAAACTAAAATGTCTGATAATGGACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAGTAACCAGAATGGAGAACGCAGTGGGGCGCGATCAAAACAACGTCGGCCCCAAGGTTTACCCAATAATACTGCGTCTTGGTTCACCGCTCTCACTCAACATGGCAAGGAAGACCT F@;3;3)167))89B950BCCEF515CF@5)CA9129FAA?4FEEFCF=<5C57;@?GG?:9*CGDGF@=CGFGGGDGGE8EGEGGGFD8GFFGFCGGEGGFGGFGGGGGGGGGGFGGDGGFGGGGGGGGFGGGGGGGGGGGFGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC MD:Z:3^A188 RG:Z:c0906184-f6a9-4be3-9fb3-268bd391c8d0 NH:i:1 NM:i:7

``

vtsyvina commented 2 years ago

Hello, @antoine4ucsd.

Can you show the full stack trace? I think this error comes from samtools library that we use to read the input. However, we used pretty old version of samtools lib - 2.11.0. I updated it to the latest 2.24.1. Download the fresh jar file and try it again.

If it fails again, I'm afraid you'll have to figure it out with samtools team https://github.com/samtools/htsjdk. I see that this error comes from this line https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/Cigar.java#L226 in their validation method

antoine4ucsd commented 2 years ago

Excellent! I will give it a try and let you know if it works asap Thank you

-- a

From: Viachaslau Tsyvina @.> Date: Sunday, October 24, 2021 at 12:58 PM To: vtsyvina/CliqueSNV @.> Cc: antoine @.>, Mention @.> Subject: Re: [vtsyvina/CliqueSNV] Padding operator not between real operators in CIGAR (Issue #14)

Hello, @antoine4ucsdhttps://github.com/antoine4ucsd.

Can you show the full stack trace? I think this error comes from samtools library that we use to read the input. However, we used pretty old version of samtools lib - 2.11.0. I updated it to the latest 2.24.1. Download the fresh jar file and try it again.

If it fails again, I'm afraid you'll have to figure it out with samtools team https://github.com/samtools/htsjdk. I see that this error comes from this line https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/Cigar.java#L226 in their validation method

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/vtsyvina/CliqueSNV/issues/14#issuecomment-950385298, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AENFHZ2LKA5I4EYTAUKJSUDUIRQOLANCNFSM5GTVP5VA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

antoine4ucsd commented 2 years ago

same error with your recent updated jar file.... all ideas are welcome

Start read sam 2000000Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR::INVALID_CIGAR:Read name M03251:179:000000000-K2J5L:1:1106:10161:20380_1:N:0:CAAGGTAC, Padding operator not between real operators in CIGAR at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:458) at htsjdk.samtools.SAMRecord.getCigar(SAMRecord.java:826) at htsjdk.samtools.SAMRecord.getCigarLength(SAMRecord.java:837) at htsjdk.samtools.SAMRecord.isValid(SAMRecord.java:2132) at htsjdk.samtools.SAMRecord.isValid(SAMRecord.java:2013) at htsjdk.samtools.SAMLineParser.parseLine(SAMLineParser.java:352) at htsjdk.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:268) at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:255) at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:228) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:591) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:570) at edu.gsu.util.DataReader.getIlluminaPairedReads(DataReader.java:108) at edu.gsu.start.Start.illumina2SNV(Start.java:132) at edu.gsu.start.Start.main(Start.java:75)

vtsyvina commented 2 years ago

This is the CIGAR: 1S8P6I3M1D188M22S. Padding operator is P. Samtools says that if has to be between real operators(according to their code) which are: M, EQ, X, I, D, N;

I see that on the left from P is S - soft clip. It fails because of it. I don't really know why padding can't be together with soft clip. You need to read sam specification for this.

antoine4ucsd commented 2 years ago

Got it Thank you again

-- a

From: Viachaslau Tsyvina @.> Date: Sunday, October 24, 2021 at 1:22 PM To: vtsyvina/CliqueSNV @.> Cc: antoine @.>, Mention @.> Subject: Re: [vtsyvina/CliqueSNV] Padding operator not between real operators in CIGAR (Issue #14)

This is the CIGAR: 1S8P6I3M1D188M22S. Padding operator is P. Samtools says that if has to be between real operators(according to their code) which are: M, EQ, X, I, D, N;

I see that on the left from P is S - soft clip. It fails because of it. I don't really know why padding can't be together with soft clip. You need to read sam specification for this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/vtsyvina/CliqueSNV/issues/14#issuecomment-950388875, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AENFHZZWHV2H7BQRKT5TJBDUIRTJLANCNFSM5GTVP5VA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

vtsyvina commented 2 years ago

I think some aligner settings need to be adjusted to conform these requirements

antoine4ucsd commented 2 years ago

this made the fix picard SetNmMdAndUqTags -R reference.fasta -I read_mapping.sam -O read_mapping.fixed.sam

closing this issue. thank you for being responsive!