marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
123 stars 25 forks source link

in <module> if hdr[0] != ">": IndexError: string index out of range error in 1.7.2 release #114

Closed mf116 closed 8 months ago

mf116 commented 2 years ago

Hi,

I am trying to create a phylogeny using around 2700 sequences, kindly find below the run. 11:31:54 - INFO - |--Parsnp 1.7.2--|

Ref *.fasta 12:25:58 - INFO -


SETTINGS: |-refgenome: .fasta |-genomes:
.fasta .fasta ...2674 more file(s)... .fasta .fasta |-aligner: muscle |-outdir: /P_2022_06_17_113154357377 |-OS: Linux |-threads: 6


12:25:58 - INFO - <> 12:25:58 - INFO - No genbank file provided for reference annotations, skipping.. Traceback (most recent call last): File "*/anaconda3/envs/parsnp/bin/parsnp", line 819, in if hdr[0] != ">": IndexError: string index out of range

this error kept on coming. please help me solve it. Thank you

bkille commented 2 years ago

Hi @mf116

Thanks for opening an issue! I think the problem here is an atypical formatting somewhere in your fasta files, although I agree that Parsnp should be able to adjust for this and I'll fix it in the next release (should be out this week). It looks like one of your fasta files is completely empty which causes parsnp to fail when trying to lookup the sequence header.

mf116 commented 2 years ago

hi @bkille,

thank you for the reply. yeah we did figure this out after. but now we are having another issue: our pc is 32 cores, 200GB ram, 4TB from which 1.3 TB are available as storage. same number of strains but we are having a different error, kindly find below the log: 07:27:15 - INFO - |--Parsnp 1.7.2--|

Ref *.fasta 09:04:02 - INFO -


SETTINGS: |-refgenome: .fasta |-genomes:
.fasta .fasta ...2674 more file(s)... .fasta .fasta |-aligner: muscle |-outdir: /P_2022_06_22_072715418693 |-OS: Linux |-threads: 12


09:04:02 - INFO - <> 09:04:02 - INFO - No genbank file provided for reference annotations, skipping.. 09:04:21 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner... Traceback (most recent call last): File "/miniconda3/envs/parsnp/bin/parsnp", line 1230, in if header[0] != ">": IndexError: string index out of range

we tried creating phylogeny on 700 random from the same isolates used in the log above and it worked. the issue is showing with the bigger number of isolates.

bkille commented 8 months ago

Hi @mf116,

This should be fixed now. Thanks for opening an issue and please let me know if it persists!