schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
303 stars 36 forks source link

ERROR syri:201 - Length of query sequence of Chr1 is less than the maximum coordinate of its aligned regions #251

Open Sabrili opened 1 month ago

Sabrili commented 1 month ago

Hi @mnshgl0110 and everyone :)

I'm sorry if this is a really obvious question, I'm new to programming and haven't used Syri before.

I am getting this error code when I try to run Syri: "Running SyRI - ERROR - syri:201 - Length of query sequence of Chr1 is less than the maximum coordinate of its aligned regions. Exiting."

I don't really understand what has triggered this error or how to fix it.

Here is the code I'm using which gives the error:

ln -sf Genome1.fa refgenome ln -sf Genome2.fa qrygenome nucmer --maxmatch -c 100 -b 500 -l 50 refgenome qrygenome
delta-filter -m -i 90 -l 100 out.delta > out.filtered.delta show-coords -THrd out.filtered.delta > out.filtered.coords syri -c out.filtered.coords -d out.filtered.delta -r refgenome -q qrygenome

Genome1 and Genome2 have the same number of chromosomes and the chromosome names are the same across the two genomes, however the chromosomes in Genome2 are much longer than Genome1 due to a high number of repeat elements.

As far as I can understand I think I have formatted the fasta files correctly?

Additionally I've noticed that the input TSV file shows alignments between the wrong chromosomes e.g. Chr1 of the reference and Chr4 of the query, could this be contributing to this error? For example: image

Any advice on understanding this error and how to fix it would be really appreciated :)

Thank you so much!

mnshgl0110 commented 1 month ago

Hi @Sabrili. The commands look correct so this error should not happen. Can you please retry re-running the pipeline in a new empty folder? If that does not solve the problem, then please check that the chromosomes named Chr1 in reference and the query are indeed homologous? To check that, you can try using fixchr to generate a dotplot of the alignments.

Sabrili commented 1 month ago

Hi @mnshgl0110 Thank you so much for your quick answer! I've tried what you have suggested above, but had no success unfortunately :( I wasn't able to figure out how to use fixchr, but I did use another tool to create a dotplot, and the two genomes are homologous, but probably don't show the same level of syteny that you'd usually expect because one genome is much larger than the other genome. Is this what could be causing the problem? thumbnail_map_final_renamed_chromAssemEDITED_Syri_to_Osativa_323_v7 0EDITED_Syri

When I separate the genomes into individual chromosomes, Syri works perfectly, but I was hoping to compare the whole genomes to see if there were any translocations across chromosomes. Chr1

mnshgl0110 commented 1 month ago

The error message implies that the alignment coordinates in the out.filtered.coords are inconsistent with the query genome fasta file. For ex: if the Chr1 is 10000bp long in the query genome fasta, but the .coords file has an alignment going till Chr1:10010, then syri would report this error. Can you check the output of the following commands:

# Max alignment coordinate for query Chr1
awk '{if($11=="Chr1"){if($3>max){max=$3}; if($4>max){max=$4}}} END {print max}' out_m_i90_l100.coords

# Length of Chr1 in the genome fasta
samtools faidx qrygenome
cat qrygenome.fai

The max alignment coordinate should be less than or equal to the chromosome length in the fasta. If that is not the case, then there is some issue with how the alignments are generated.