schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
306 stars 36 forks source link

IndexError: string index out of range #139

Closed jaudall closed 2 years ago

jaudall commented 2 years ago

I've run 90 or so sryi jobs to create output for plotsr using 10 different genomes. Of those jobs, 5 of them consistently fail. I re-ran the alignments thinking the bam may have been corrupted, but I still get the same error with syri.

I'm using this command: syri -c ${i}${j}.aln.bam --dir ${i}${j} -r ${i}.genome.fasta -q ${j}.genome.fasta -F B --prefix ${i}_${j} --all --log DEBUG

and the output isn't much help: Begin Time: Thu Apr 28 10:19:31 CDT 2022 Traceback (most recent call last): File "/project/cotton_genomics/miniconda3_syri/envs/syriX/bin/syri", line 4, in import('pkg_resources').run_script('syri==1.5.5', 'syri') File "/project/cotton_genomics/miniconda3_syri/envs/syriX/lib/python3.9/site-packages/pkg_resources/init.py", line 672, in run_script self.require(requires)[0].run_script(script_name, ns) File "/project/cotton_genomics/miniconda3_syri/envs/syriX/lib/python3.9/site-packages/pkg_resources/init.py", line 1472, in run_script exec(code, namespace, namespace) File "/project/cotton_genomics/miniconda3_syri/envs/syriX/lib/python3.9/site-packages/syri-1.5.5-py3.9-linux-x86_64.egg/EGG-INFO/scripts/syri", line 6, in main(sys.argv[1:]) File "/project/cotton_genomics/miniconda3_syri/envs/syriX/lib/python3.9/site-packages/syri-1.5.5-py3.9-linux-x86_64.egg/syri/scripts/syri.py", line 319, in main syri(args) File "/project/cotton_genomics/miniconda3_syri/envs/syriX/lib/python3.9/site-packages/syri-1.5.5-py3.9-linux-x86_64.egg/syri/scripts/syri.py", line 246, in syri getshv(args, coords, chrlink) File "syri/pyxFiles/findshv.pyx", line 257, in syri.findshv.getshv File "syri/pyxFiles/findshv.pyx", line 296, in syri.findshv.getshv IndexError: string index out of range End Time: Thu Apr 28 10:22:34 CDT 2022

mnshgl0110 commented 2 years ago

Are these 5 crashes happening for same genome or multiple genomes are involved?

I assume that the crash would be happening while generating snps.txt file, if so then you can check the last line to see which chromosomes are potentially involved and then try analysing them separately.

As such, I do not know what is causing this, so I would need a minimalist example to reproduce this and then check.

jaudall commented 2 years ago

Most of them involve the same genome. However, Syri worked ok with that 'buggy genome' when it was used as the query instead of the target.

How do I limit syri to a chromosome? I don't see that as an option.

I turned off snps (--nosnps) and that seemed to 'fix' it for my immediate needs.

mnshgl0110 commented 2 years ago

How do I limit syri to a chromosome? I don't see that as an option.

You would need to extract those chromosomes into separate files and realing and analyse. Since, it is happening in only one genome I would suggest to check the genome fasta itself. There could be something weird in the fasta that gets transmitted to alignments and specifically the CIGAR string resulting in this crash.

jaudall commented 2 years ago

hmm, any suggestions on what character to look for? The sequences were all passed through a bioperl seqIO and should be ok. It seems like if it causing a problem when used as a query, the same character would cause a problem when the sequence is used as the target ... but it isn't. I'm happy to share files ... :-)

mnshgl0110 commented 2 years ago

any suggestions on what character to look for?

Not really.

Are you using syriv1.5.5? This version accepts PAF file as input as well. Could you please try that and see whether you get the same error with it as well? If the issue persists, then could you please share the genomes fasta (the problematic and one non-problematic genome).