Closed wjwei-handsome closed 1 year ago
BAM has a limitation of only being able to encode up to 65535 CIGAR operations in a CIGAR field. The workaround is to use a soft clip and a skip operation as a flag to tell a decoder that the CIGAR operations are stored in a auxiliary data field. See § 4.2.2 "N_CIGAR_OP
field" (2022-08-22) for more details.
Unfortunately, the case when the alignment record has > 65535 CIGAR operations and the reference sequence length is > 228-1 is incompatible with the BAM format. I can only recommend using SAM or CRAM instead, both of which do not have this limitation.
I actually learned this point for the first time (after using bam/sam for so long)!!!
Thank you very much for your answer! I learned useful new knowledge!
In response to this problem, I will try other methods, thank you!
As for why my cigar operation length is so long, it is because I have stored the results of the whole genome comparison, and the starting position of query genome is represented by HardClip.
Best wishes, Wei
Hi!
when I using
noodles-bam
to write some bam records, a Err encountered:invalid CIGAR op length
.I noticed that: the function
overflowing_put_cigar_op_count
will writeSoftclip
andSkip
whenrecord.cigar().len()
can't convert to u16: https://github.com/zaeleus/noodles/blob/c5aadc5f659a13da1ba826a98463b748472fd7b7/noodles-bam/src/record/codec/encoder.rs#L187-L204But when processing function
put_cigar
to writeSoftClip
andSkip
, ifreference_sequence.length()
>268435455
, it will return a Err that i encountered:https://github.com/zaeleus/noodles/blob/c5aadc5f659a13da1ba826a98463b748472fd7b7/noodles-bam/src/record/codec/encoder/cigar.rs#L21-L36
Actually, some specie's chromosome length is more bigger than
268435455
, so I want to hear yours advices.Thanks!