schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
303 stars 36 forks source link

Sequence length calculation error #226

Closed jiadong324 closed 5 months ago

jiadong324 commented 7 months ago

Hi,

Thanks for developing this nice tool!

Here is my error message for single chromosome comparison:

Running SyRI - ERROR - Length of query sequence of chr16_hap2 is less than the maximum coordinate of its aligned regions.

However, I checked the maximum alignment coordinate, which is smaller than the length of my query sequence.

Then, I added record_dict = SeqIO.to_dict(SeqIO.parse(qry_fa, "fasta")) print(len(record_dict['chr16_hap2'])) for debug. The query length calculated by my code is the same as the one calculated by samtools faidx, which is 13bp longer than the query length calculated by your readFasta() function.

Looking forward to your reply!

mnshgl0110 commented 7 months ago

Are there white spaces or special characters in the fasta sequence? Other than that I cannot thing of any other reason for this discrepancy. Please check and remove (if any) such characters. If this does not work then, please share the sequence for me to test.

jiadong324 commented 7 months ago

Please download the sequence at: https://drive.google.com/file/d/1Cqs1MF8x_WEuy9ZYZqiR2L0UQQbyh93c/view?usp=drive_link

jiadong324 commented 7 months ago

I also have the warning and error below:

Reading Coords - WARNING - Reference chromosome haplotype1-0000038 has high fraction of inverted alignments with its homologous chromosome in the query genome (haplotype1-0000031). Ensure that same chromosome-strands are being compared in the two genomes, as different strand can result in unexpected errors.

...

IndexError: list index out of range 

According to the existing issues, I first made the dotpot with minimap2 alignment. It has several short inverted repeats highlighted in red. image.

Once I changed to wfmash, most of the short inverted segments are gone. But I still got the same warning and error.

image

Is the index error caused by these inverted segments?

mnshgl0110 commented 7 months ago

Are you sure that the red alignments are inverted and the blue ones are directed? I would guess otherwise. Please recheck. Also, I cannot access that fasta file using this link.

jiadong324 commented 7 months ago

You are correct, the blue lines are inverted. I found people do reverse complementary, what is your best suggestion for such case.

Please check the new link https://drive.google.com/file/d/1Cqs1MF8x_WEuy9ZYZqiR2L0UQQbyh93c/view?usp=sharing.

mnshgl0110 commented 7 months ago

The fasta files are wrong. Please fix them.

10:47 goel@pc-t7-130 netscratch:issue226$ tail hap2_sorange2borange.chr16.fa 
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTGGTTAGGGTTAGGGTTAGGGTTAG
GGTTAG>chr17_