schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
305 stars 36 forks source link

Incorrect CIGAR string found. #180

Closed ylevmirom closed 1 year ago

ylevmirom commented 1 year ago

Hi, I'm trying to use syri to look for structural variations between two genomes. for alignment I'm using minimap2 (tried -ax and -cx options), and when running syri I get this error: Reading BAM/SAM file - ERROR - Incorrect CIGAR string found. CIGAR string can only have I/D/H/S/X/=. CIGAR STRING: 1424M25D75M

i tried to run minimap again but still same error, pls help

Thanks,

Yael

JesseBNL commented 1 year ago

Hi Yael,

You have to use the --eqx command in your minimap2 line to replace the M for X/= in your CIGAR string.

ylevmirom commented 1 year ago

Thanks!

marade commented 1 year ago

It's worth pointing out that not all SAM files come from minimap2, and 'M' is correct CIGAR under the SAM specification:

https://samtools.github.io/hts-specs/SAMv1.pdf

Therefore in my opinion this is a bug that isn't fixed.

mnshgl0110 commented 1 year ago

Syri uses CIGAR string to call SNPs and short indels. In CIGAR, M is used for both matches and mismatches, which means that using it for calling SNPs would require checking the bases in both genomes at all positions that have an M. This could result in significant computational overhead. Using =/X solves this issue as only positions with X needs to be fetched.

I am also not aware of other popular whole-genome alignment methods that do not produce SAM files with =X.

marade commented 1 year ago

nucmer with --sam-long produces such output. There are probably more. If you're not going to support the full CIGAR specification, it would be nice to document that under the limitations, because others might use the tool as I did, thinking that any SAM file that conforms to the specification would work.

mnshgl0110 commented 1 year ago

Documentation already shows how to generate and use alignments from minimap2 and nucmer https://schneebergerlab.github.io/syri/pipeline.html

Nevertheless, please feel free to update the documentation/code and start a pull request.