schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
323 stars 35 forks source link

Huge chromosome problem #244

Closed socialhang closed 5 months ago

socialhang commented 5 months ago

Hi @mparker2 @mnshgl0110 @mbhall88 @HeQSun @lrauschning

Due to the huge chr, we go this error:

$syri -c chrZ.chrW.sam -r chrZ.fa -q chrW.fa -F B --prefix chrZ.chrW [E::parse_cigar] CIGAR length too long at position 1 (350095110S) [W::sam_read1_sam] Parse error at line 3 Reading BAM/SAM file - ERROR - Error in reading BAM/SAM file. truncated file

What should I do?

mnshgl0110 commented 5 months ago

When using sam file, set -F S.

socialhang commented 5 months ago

same problems! @mnshgl0110

syri -c chrZ.chrW.sam -F S -r chrZ.fa -q chrW.fa -F B --prefix chrZ.chrW [E::parse_cigar] CIGAR length too long at position 1 (350095110S) [W::sam_read1_sam] Parse error at line 3 Reading BAM/SAM file - ERROR - Error in reading BAM/SAM file. truncated file

socialhang commented 5 months ago

my chromosome length > 450Mb

mnshgl0110 commented 5 months ago

Please recheck your command

mparker2 commented 5 months ago

Hope you don't mind me chiming in @mnshgl0110 ...

[E::parse_cigar] CIGAR length too long at position 1 (350095110S)

^ this is not an issue with syri, but with samtools/htslib, which has an upper limit on the size of individual CIGAR operations:

https://github.com/samtools/samtools/issues/1667

Your SAM file contains an alignment with a 350 Mb softclip. So possibly your alignment is not very good?

mnshgl0110 commented 5 months ago

In this particular case, the input is a SAM file, but the command, in addition to -F S, also have -F B which results in the input file being processed as a BAM file.

The issue with the CIGAR length affects .bam files (also htslib/pysam). SAM files do not have this restriction. So, syri uses a custom reader for SAM files which allows analysis of large genomes.