mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
109 stars 27 forks source link

IndexError: list index out of range when using unitig-caller output #143

Closed jolindadekorne closed 3 years ago

jolindadekorne commented 3 years ago

Dear John,

I am using unitig-caller to create a unitigs.fasta file as input for pyseer. However, when using the unitig file as input for pyseer with the basic command:

pyseer --phenotypes GGI.pheno --no-distances --kmers unitig_caller_out/unitigs.fasta --uncompressed --cpu 2

I keep getting this error:

Read 765 phenotypes Detected binary phenotype Traceback (most recent call last): File "/home/jdkorne/.conda/envs/mc3/bin/pyseer", line 10, in <module> sys.exit(main()) File "/home/jdkorne/.conda/envs/mc3/lib/python3.7/site-packages/pyseer/__main__.py", line 739, in main options.cpu*options.block_size)) File "/home/jdkorne/.conda/envs/mc3/lib/python3.7/multiprocessing/pool.py", line 276, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/home/jdkorne/.conda/envs/mc3/lib/python3.7/multiprocessing/pool.py", line 383, in _map_async iterable = list(iterable) File "/home/jdkorne/.conda/envs/mc3/lib/python3.7/site-packages/pyseer/input.py", line 577, in iter_variants sample_order) File "/home/jdkorne/.conda/envs/mc3/lib/python3.7/site-packages/pyseer/input.py", line 364, in read_variant '|')[1].lstrip().split()) IndexError: list index out of range

These are the first lines of the unitigs.fasta file:

>1
AATTTCGACTTAACTTCGGCACACCGTCCCGGCAG
>2
CGACTTAACTTCGGCACACCGTCCCGGCAGCTAA
>3
TTAACTTCGGCACACCGTCCCGGCAGCTAAAAATCCT
>4
CGGCACACCGTCCCGGCAGCTAAAAATCCTGCG
>5
CACACCGTCCCGGCAGCTAAAAATCCTGCGGG
>6
CCGTCCCGGCAGCTAAAAATCCTGCGGGATCGG

Pyseer runs with exactly the same command when using the unitigs.txt output file from unitig-counter, so I wonder if it is related to the file. I am using version 1.3.6, installed with conda.

Do you have any idea what causes this error? Thank you in advance!

Kind regards, Jolinda

johnlees commented 3 years ago

Hi Jolinda,

This looks like the wrong file, you need the one which lists sequences and the samples they are present in. I think that unitig-caller should output this with a 'pyseer' in the name? Can you list the files you got as output?

jolindadekorne commented 3 years ago

Hi John,

Thanks for the quick response!

This is the list of output files (I used the --buildoption and unitigs_caller_out as prefix):

unitigs_caller_out_unitigs.fasta
unitigs_caller_out.gfa 
unitigs_caller_out.bfg_colors

I realize now that I might need to run the --query command to query the unitigs in my input assemblies and get the file that is needed for pyseer, is that correct?

Thank you!

johnlees commented 3 years ago

Ah yes, that's right. We will combine those steps in the near future, but for now you need to run it following build