schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
303 stars 36 forks source link

IndexError: list index out of range(direction) #223

Closed Pengzw0909 closed 7 months ago

Pengzw0909 commented 7 months ago

Dear, I am writing about Syri errors. My query genome has been adjusted for ID and direction based on reference genome, but there is some erorr.

cmd: syri -r /share/nas1/pengzw/project/xxx/08.pan_genome/Results/01.variant/00.split_fafile/Ref/LsL46.fasta -q /share/nas1/pengzw/project/xxx/08.pan_genome/Results/01.variant/00.split_fafile/Query/3.Lviro/Lviro.fasta -c /share/nas1/pengzw/project/xxx/08.pan_genome/Results/01.variant/01.mummer_alignment/3.LsL46_vs_Lviro/LsL46_vs_Lviro.coords -d /share/nas1/pengzw/project/xxx/08.pan_genome/Results/01.variant/01.mummer_alignment/3.LsL46_vs_Lviro/LsL46_vs_Lviro.all.filter.delta --prefix LsL46_vs_Lviro. --nc 9 -s /share/nas2/genome/biosoft/Anaconda3/2019.03/envs/mummer4/bin//show-snps

################### cat syri.log 2023-11-03 09:11:17,568 - Reading Coords - INFO - syri:130 - Reading input from .tsv file 2023-11-03 09:11:41,375 - syri - INFO - syri:209 - starting 2023-11-03 09:11:41,381 - syri - INFO - syri:209 - Analysing chromosomes: ['Chr01', 'Chr02', 'Chr03', 'Chr04', 'Chr05', 'Chr06', 'Chr07', 'Chr08', 'Chr09'] 2023-11-03 09:11:41,656 - syri.Chr01 - INFO - mapstar:48 - Chr01 (262, 11) 2023-11-03 09:11:41,656 - syri.Chr01 - INFO - mapstar:48 - Identifying Synteny for chromosome Chr01 2023-11-03 09:11:41,659 - syri.Chr02 - INFO - mapstar:48 - Chr02 (326, 11) 2023-11-03 09:11:41,659 - syri.Chr02 - INFO - mapstar:48 - Identifying Synteny for chromosome Chr02 2023-11-03 09:11:41,662 - syri.Chr03 - INFO - mapstar:48 - Chr03 (242, 11) 2023-11-03 09:11:41,662 - syri.Chr03 - INFO - mapstar:48 - Identifying Synteny for chromosome Chr03 2023-11-03 09:11:41,665 - syri.Chr04 - INFO - mapstar:48 - Chr04 (200, 11) 2023-11-03 09:11:41,666 - syri.Chr04 - INFO - mapstar:48 - Identifying Synteny for chromosome Chr04 2023-11-03 09:11:41,668 - syri.Chr05 - INFO - mapstar:48 - Chr05 (304, 11) 2023-11-03 09:11:41,668 - syri.Chr05 - INFO - mapstar:48 - Identifying Synteny for chromosome Chr05 2023-11-03 09:11:41,672 - syri.Chr06 - INFO - mapstar:48 - Chr06 (241, 11) 2023-11-03 09:11:41,672 - syri.Chr06 - INFO - mapstar:48 - Identifying Synteny for chromosome Chr06 2023-11-03 09:11:41,673 - syri.Chr07 - INFO - mapstar:48 - Chr07 (181, 11) 2023-11-03 09:11:41,673 - syri.Chr07 - INFO - mapstar:48 - Identifying Synteny for chromosome Chr07 2023-11-03 09:11:41,674 - syri.Chr08 - INFO - mapstar:48 - Chr08 (196, 11) 2023-11-03 09:11:41,675 - syri.Chr08 - INFO - mapstar:48 - Identifying Synteny for chromosome Chr08 2023-11-03 09:11:41,676 - syri.Chr09 - INFO - mapstar:48 - Chr09 (204, 11) 2023-11-03 09:11:41,676 - syri.Chr09 - INFO - mapstar:48 - Identifying Synteny for chromosome Chr09 2023-11-03 09:11:41,745 - syri.Chr07 - INFO - mapstar:48 - Identifying Inversions for chromosome Chr07 2023-11-03 09:11:41,747 - syri.Chr09 - INFO - mapstar:48 - Identifying Inversions for chromosome Chr09 2023-11-03 09:11:41,753 - syri.Chr04 - INFO - mapstar:48 - Identifying Inversions for chromosome Chr04 2023-11-03 09:11:41,753 - syri.Chr08 - INFO - mapstar:48 - Identifying Inversions for chromosome Chr08 2023-11-03 09:11:41,754 - syri.Chr03 - INFO - mapstar:48 - Identifying Inversions for chromosome Chr03 2023-11-03 09:11:41,762 - syri.Chr06 - INFO - mapstar:48 - Identifying Inversions for chromosome Chr06 2023-11-03 09:11:41,766 - syri.Chr02 - INFO - mapstar:48 - Identifying Inversions for chromosome Chr02 2023-11-03 09:11:41,768 - syri.Chr01 - INFO - mapstar:48 - Identifying Inversions for chromosome Chr01 2023-11-03 09:11:41,786 - syri.Chr05 - INFO - mapstar:48 - Identifying Inversions for chromosome Chr05 2023-11-03 09:11:43,642 - syri.Chr06 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome Chr06 2023-11-03 09:11:43,642 - syri.Chr05 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome Chr05 2023-11-03 09:11:43,643 - syri.Chr02 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome Chr02 2023-11-03 09:11:43,644 - syri.Chr01 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome Chr01 2023-11-03 09:11:43,649 - syri.Chr08 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome Chr08 2023-11-03 09:11:43,652 - syri.Chr03 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome Chr03 2023-11-03 09:11:43,663 - syri.Chr07 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome Chr07 2023-11-03 09:11:43,673 - syri.Chr04 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome Chr04 2023-11-03 09:11:43,703 - syri.Chr09 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome Chr09 ##############################

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/share/nas2/genome/biosoft/Anaconda3/2019.03/envs/syri_env/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/share/nas2/genome/biosoft/Anaconda3/2019.03/envs/syri_env/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "syri/pyxFiles/synsearchFunctions.pyx", line 803, in syri.synsearchFunctions.syri IndexError: list index out of range """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/share/nas2/genome/biosoft/Anaconda3/2019.03/envs/syri_env/bin/syri", line 6, in main(sys.argv[1:]) File "/share/nas2/genome/biosoft/Anaconda3/2019.03/envs/syri_env/lib/python3.9/site-packages/syri/scripts/syri.py", line 319, in main syri(args) File "/share/nas2/genome/biosoft/Anaconda3/2019.03/envs/syri_env/lib/python3.9/site-packages/syri/scripts/syri.py", line 209, in syri startSyri(args, coords[["aStart", "aEnd", "bStart", "bEnd", "aLen", "bLen", "iden", "aDir", "bDir", "aChr", "bChr"]]) File "syri/pyxFiles/synsearchFunctions.pyx", line 505, in syri.synsearchFunctions.startSyri File "syri/pyxFiles/synsearchFunctions.pyx", line 506, in syri.synsearchFunctions.startSyri File "/share/nas2/genome/biosoft/Anaconda3/2019.03/envs/syri_env/lib/python3.9/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/share/nas2/genome/biosoft/Anaconda3/2019.03/envs/syri_env/lib/python3.9/multiprocessing/pool.py", line 771, in get raise self._value IndexError: list index out of range

Could you please suggest what the issue might be which is resulting in this error?

Thank you for your help. Best regards.

mnshgl0110 commented 7 months ago

Hi @Pengzw0909. The issue is mostly caused by absence of synteny in one (or more) chromosomes. Mostly, this is caused by incorrect strand usage. But as you mentioned that you have corrected the strand, I am not quite sure what else could have caused this.

Could you please share the alignment dotplot for the two genomes? You can use fixchr to generate that.

Also, the file sizes of all the *Out.txt files.

It will also be helpful to have the log file generated by running syri with --log DEBUG parameter.

Pengzw0909 commented 7 months ago

Hi @Pengzw0909. The issue is mostly caused by absence of synteny in one (or more) chromosomes. Mostly, this is caused by incorrect strand usage. But as you mentioned that you have corrected the strand, I am not quite sure what else could have caused this.

Could you please share the alignment dotplot for the two genomes? You can use fixchr to generate that.

Also, the file sizes of all the *Out.txt files.

It will also be helpful to have the log file generated by running syri with --log DEBUG parameter.

Hello,
1.query genome is Lactuca_sativa, about 2.6G, another is Lactuca_virosa. It's too slow. Instead, I use the mummer to get the dotplot png. cmd: nucmer LsL46.fa Lviro.fa --maxmatch -c 500 -b 500 -l 100 -t 6 -p LsL46_vs_Lviro && delta-filter -1 -i 90 -l 500 LsL46_vs_Lviro.delta> LsL46_vs_Lviro.filtered.delta mummerplot -p LsL46_vs_Lviro.all.filter.delta Lviro.all.filter.delta --png

LsL46_vs_Lviro all filter delta

2.the file sizes of all the *Out.txt files: 1700729763407

3.--log DEBUG
syri.log

Thanks you.

mnshgl0110 commented 7 months ago

Syri seems to be exiting because it is not able to find synteny for chromosome 9, which again points to mismatching strands. I cannot recall whether mummerplot internally reverse complements the alignments or not, but if it does then this visualisation would not be helpful. Maybe you try using fixchr/dotplot on only the alignments from Chr9 as that might be fast (you might need to subset the fasta files as well)?

Pengzw0909 commented 7 months ago

Syri seems to be exiting because it is not able to find synteny for chromosome 9, which again points to mismatching strands. I cannot recall whether mummerplot internally reverse complements the alignments or not, but if it does then this visualisation would not be helpful. Maybe you try using fixchr/dotplot on only the alignments from Chr9 as that might be fast (you might need to subset the fasta files as well)?

hello, I get the dotplot of Chr9. There are some structural variation.

1.cmd: ref=LsL46.Chr09.fasta qry=Lviro.Chr09.fasta minimap2 -cx asm20 -t 50 --eqx $ref $qry >minimap2.paf /share/nas1/pengzw/software/fixchr/bin/dotplot -c minimap2.paf -r LsL46.Chr09.fasta -q Lviro.Chr09.fasta -F P

2.fa:Th file size of Chr09 is too big, about 60Mb. I can't upload.

3.dotplot Chr09: image

Thank you for your help~

mnshgl0110 commented 7 months ago

Hi. Indeed it seems that Chr9 needs to be reverse complemented. The two large forward aligned chunks are not linear to each other, rather the currently inverted blue chunks are linear. I think reverse complementing Chr9 (using the other strand) would solve the issue.

Pengzw0909 commented 7 months ago

Hi. Indeed it seems that Chr9 needs to be reverse complemented. The two large forward aligned chunks are not linear to each other, rather the currently inverted blue chunks are linear. I think reverse complementing Chr9 (using the other strand) would solve the issue.

Hi, I do as you say, and the problem is solved. Thank you very much~ the next time, I think changing direction requires a bit of consideration of location.

image