schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
305 stars 36 forks source link

IndexError: list index out of range #197

Closed Bearmax90 closed 7 months ago

Bearmax90 commented 1 year ago

Hi, I executed the following commands:

nucmer -c 1000 -l 40 -t 16 -p newTGY ref.fa TGY.qry.fasta delta-filter -m -i 90 -l 100 newTGY.delta > newTGY.mdelta show-coords -THrd newTGY.mdelta > newTGY.mcoords syri --nc 16 --nosnp --prefix TGY. -c newTGY.mcoords -d newTGY.mdelta -r ref.fa -q TGY.qry.fasta

but it stopped wtih these messages: { multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/niut02/anaconda3/envs/bioinfo/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/niut02/anaconda3/envs/bioinfo/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "syri/pyxFiles/synsearchFunctions.pyx", line 803, in syri.synsearchFunctions.syri IndexError: list index out of range """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/niut02/software/syri/syri-1.6.3/bin/syri", line 6, in main(sys.argv[1:]) File "/home/niut02/anaconda3/envs/bioinfo/lib/python3.9/site-packages/syri-1.6.3-py3.9-linux-x86_64.egg/syri/scripts/syri.py", line 326, in main syri(args) File "/home/niut02/anaconda3/envs/bioinfo/lib/python3.9/site-packages/syri-1.6.3-py3.9-linux-x86_64.egg/syri/scripts/syri.py", line 214, in syri startSyri(args, coords[["aStart", "aEnd", "bStart", "bEnd", "aLen", "bLen", "iden", "aDir", "bDir", "aChr", "bChr"]]) File "syri/pyxFiles/synsearchFunctions.pyx", line 505, in syri.synsearchFunctions.startSyri File "syri/pyxFiles/synsearchFunctions.pyx", line 506, in syri.synsearchFunctions.startSyri File "/home/niut02/anaconda3/envs/bioinfo/lib/python3.9/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/niut02/anaconda3/envs/bioinfo/lib/python3.9/multiprocessing/pool.py", line 771, in get raise self._value IndexError: list index out of range }

I checked the relevant issues (Duplicate #48 #176) in the forum, but they don't seem to be the same as my problem. This is my log file

Any way to solve this issue? Many thanks, Biao TGY.syri.log

mnshgl0110 commented 1 year ago

Hi Biao, I agree that the current log files does not suggest that the issue is caused by different strands, nevertheless, given that different strands is the only known source for causing this error, could you please use fixchr and check whether that solves the issue. Also, please share the dotplots generated by fixchr.

Bearmax90 commented 1 year ago

Thanks for replying. Instead of using fixchr , I used chroder to tune the chr name and chain direction of the original sequence. The command line is as follows nucmer -c 1000 -l 40 -t 16 -p TGY ref.fa TGY.fa delta-filter -1 -i 90 -l 100 TGY.delta > TGY.1delta show-coords -THrd TGY.1delta > TGY.1coords chroder -o TGY TGY.1coords ref.fa TGY.fa

Then, I ran the above codes. Actually, I analyzed a total of six sequence files, four of which succeeded and two of which failed. I double-checked the format of these files but did not find the problem. After reading the issues in your forum, I guess it might be a problem that chromosomes are from different strands.

I will run it again with fixchr.

Bearmax90 commented 1 year ago

I ran fixchr --prefix TGY -c TGY.1coords -r ref.fa -q TGY.qry.fasta (2023-06-01 02:35:01,134 - fixchr - WARNING - Inverting query chromosomes: ['Chr07'] (fixchr.py:77))

output files: TGYinput_alignments.txt TGYinput.pdf TGYhomologous_alignments.txt TGYhomologous.pdf TGYhomologous_strand_corrected_alignments.txt TGYhomologous_strand_corrected.pdf ref.filtered.fa TGY.qry.filtered.fa

TGYinput.pdf TGYhomologous_strand_corrected.pdf

Should I use result file TGY.qry.filtered.fa for subsequent analysis?

mnshgl0110 commented 1 year ago

Yes, try using that for alignment and then running syri.

Bearmax90 commented 1 year ago

Hi, Manish I performed the two files (TGY and DASZ) that did not fail to analyze before in the above way. TGY one ran successfully, but strangely the DASZ one still reported the same error log.

output files: DASZinput.pdf DASZhomologous_alignments.txt DASZhomologous.pdf DASZhomologous_strand_corrected_alignments.txt DASZhomologous_strand_corrected.pdf

And the log file: DASZ.syri.log DASZinput.pdf DASZhomologous_strand_corrected.pdf DASZhomologous.pdf

Could please help me DASZ.syri.log analyze the reason again?

Many thanks

mnshgl0110 commented 1 year ago

It seems that fixchr incorrectly reverse complemented chromosome 5 (compare DASZinput.pdf vs DASZhomologous_strand_corrected.pdf) . Using the original strand should solve the issue. You can use the dotplot tool (installed with fixchr) to visualise and ensure the chromosomes are collinear.

Bearmax90 commented 1 year ago

I used the original strand of Chr05, but the whole run analysis still didn't work. The generated intermediate files are as follows: image image DASZ.syri.log

So, I tried to invert all the chromosomes without output content (Chr05, 06, 13,14,15) again and run it again, but it still failed. I used the dotplot tool to visualize the linearity of these chromosomes, but the run failed with the following feedback: "ValueError: Length mismatch: Expected axis has 11 elements, new values have 12 elements"

Then, I started the processing and analysis of the data of DASZ sample all over again. However, it still failed (even though I reversed Chr05 again). input.pdf homologous_strand_corrected.pdf

image image DASZ.syri.log

I don't have a clue anymore. Could please help me analyze the reason again?

mnshgl0110 commented 1 year ago

The log files show that some chromosomes have high fraction of inverted alignments, so yes there still seems to be issues with the strands. Also, I noticed that the Chromosome Ids are not consistent (ex: Chr11 in the two genomes are not homologous, and Chr4 and Chr3 are named interchangeably). This again can result in crashes as syri cannot find syntenic regions between homologous chromosomes (chromosomes with same ID). Could you please rename the chromosomes so that homologous chromosomes have same ID? After that try to ensure that you do not get warnings like this: 2023-06-09 11:34:25,376 - Reading Coords - WARNING - syri:135 - Reference chromosome Chr11 has high fraction of inverted alignments with its homologous chromosome in the query genome (Chr11). Ensure that same chromosome-strands are being compared in the two genomes, as different strand can result in unexpected errors. If it still crashes then you can try to compare homologous chromosomes individually with syri (Chr1vsChr1, Chr2vsChr2 etc). That might help in better pinpointing the cause of the issue.

Bearmax90 commented 1 year ago

Thanks Manish. I successfully ran the analysis of this file after splitting this genome into separate runs for each stain file to check for problems! All tasks have been successfully completed. Many thanks~