Open frederic-mahe opened 7 years ago
This can be confirmed.
Here is the SAM format specification:
https://samtools.github.io/hts-specs/SAMv1.pdf
Other sources of information:
https://doi.org/10.1093/bioinformatics/btp352 http://www.drive5.com/usearch/manual/cigar.html
The spec indicates that the CIGAR string in the SAM format is in the direction from the reference (target) to the query. Deleted symbols are only found in the reference (target), while inserted symbols are only found in the query. The point of view may be different in other formats. I have made the output similar to USEARCH.
The spec does not say, but in SAM files there always seems to be an '1' in front of operations that happen once, but in other contexts this '1' is sometimes skipped.
SAM's CIGAR strings require a number between each letter ("7M1I" instead of "7MD"), but the main different is in the "point-of-view".
SAM's CIGAR strings encode the target modifications needed to equal the query, whereas CIGAR strings in other output formats encode the query modifications needed to equal the target.
If this is confirmed, that should be indicated in the documentation.