Closed ArtRand closed 1 year ago
Hi @ArtRand,
That error means the CIGAR is invalid for the given sequence. Are you able to provide an example input or the (SAM) record that causes this?
Hello @zaeleus thanks for the quick response.
Here's an example of an offending record:
900b1b9d-9282-43a3-abe8-2d80bbd0530a 256 chr1 10001 0 726S10M5I23M1I28M2I2M2I45M1I38M1D12M1I17M4I6M2I65M1D11M1I32M1I38M1D6M1I16M1I31M1I6M1D6M1D6M1I65M2276S * 0 0 * * NM:i:41 ms:i:731 AS:i:720 nn:i:0 tp:A:S cm:i:16 s1:i:118 de:f:0.0643 rl:i:781 qs:i:9 du:f:13.799 ns:i:55196 ts:i:10 mx:i:3 ch:i:448 st:Z:2022-04-11T19:12:20.240+00:00 rn:i:4913 f5:Z:chr20-chr20.pod5 sm:f:84.638 sd:f:22.55 sv:Z:quantile MM:Z:C+h?,4,76,52,672,157,7,183,34,0,114,3,14,1,10,5,0,3,1,6,7,0,1,0,0,2,9,8,3,14,14,2,2,3,3,2,2,0,0,2,2,1,0,0,0,1,0,1,1,0,1,0,1,1,0,0,1,0,1,1,0,0,1,0,1,1,0,1,0,0,0,0,1,0,1,1,0,0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,0,0,1,0,0,1,0,2,0,1,1,1,1,0,1,1,0,1,0,1,1,0,0,0,1,0,1;C+m?,4,76,52,672,157,7,183,34,0,114,3,14,1,10,5,0,3,1,6,7,0,1,0,0,2,9,8,3,14,14,2,2,3,3,2,2,0,0,2,2,1,0,0,0,1,0,1,1,0,1,0,1,1,0,0,1,0,1,1,0,0,1,0,1,1,0,1,0,0,0,0,1,0,1,1,0,0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,0,0,1,0,0,1,0,2,0,1,1,1,1,0,1,1,0,1,0,1,1,0,0,0,1,0,1; ML:B:C,13,10,253,22,18,4,6,102,224,8,3,253,8,3,43,49,20,112,232,3,31,136,72,27,80,3,81,0,4,42,12,4,40,185,38,188,145,56,0,14,31,6,113,0,0,3,82,37,157,1,1,53,124,2,2,0,45,51,197,45,245,0,21,38,157,138,232,104,30,25,195,2,2,12,21,14,240,255,46,80,16,28,162,4,1,52,241,45,10,10,103,30,71,1,29,204,185,115,0,72,30,52,0,0,12,31,30,55,175,6,12,112,179,9,0,6,4,143,0,0,1,173,0,0,240,153,4,0,0,2,0,252,210,206,234,143,21,18,223,116,183,227,170,252,173,255,0,213,243,0,214,70,216,37,110,197,255,240,224,249,142,255,255,252,173,218,98,254,253,202,130,253,253,255,210,204,58,210,10,255,234,217,96,100,22,147,225,228,60,253,253,243,233,241,15,0,203,175,239,226,93,251,0,203,14,210,245,245,152,224,184,253,226,46,70,140,255,183,225,203,255,254,242,224,225,200,79,249,243,143,76,246,255,249,251,112
I think the problem originates from comes from put_sequence where the logic may need to be adjusted to
// § 1.4.10 "`SEQ`" (2022-08-22): "If not a '*', the length of the sequence must equal the sum
// of lengths of `M`/`I`/`S`/`=`/`X` operations in `CIGAR`."
- if read_length > 0 && sequence.len() != read_length {
+ if sequence.len() > 0 && sequence.len() != read_length {
return Err(io::Error::new(
io::ErrorKind::InvalidInput,
"read length-sequence length mismatch",
since secondary alignments will have *
for the SEQ field but non-zero CIGAR lengths.
The above diff causes some of the tests to fail. I can chase those down and submit a PR if you think it's the correct approach.
Thanks for the example and investigation! This is indeed a bug.
Yes, that's the function that's failing. In this case, the correct approach would be to simply return if the sequence is empty. See the SAM sequence writer for a example (though note you don't have to write any output for BAM). Feel free to submit a patch!
Great. PR is open. Happy to accept any comments and suggestions.
Hello,
Firstly, I'm really excited about using this library. My question is probably of the "holding it wrong" variety. I am investigating using the bam::AsyncReader and bam::AsyncWriter, when I encounter a secondary/reverse-complement secondary alignment I get this error from write_record:
read length-sequence length mismatch
. I couldn't tell from the documentation if there is something I need to do in order to write these records. Thanks for the help.simplified code example