klebgenomics / Kaptive

GNU General Public License v3.0
44 stars 21 forks source link

Kaptive failed to run #13

Closed mingjuhao closed 3 years ago

mingjuhao commented 3 years ago

Hi, many thanks for the great Kaptive. Sometime, I will encounter this kaptive problem. When I run it again without any modification, It can get through. I wonder how to evade this problem and run more smoothly. I noticed LeahRoberts #9 had resolved by modifying the database. But I don't know how to do that. Where is the dabatase? Could someone give me some tips about the process? Thanks! Error: Kaptive failed to run with the following error: /home/hao/anaconda3/lib/python3.7/site-packages/Bio/GenBank/init.py:1300: BiopythonParserWarning: The NCBI states double-quote characters like " should be escaped as "" (two double - quotes), but here it was not: 'undecaprenyl-phosphate galactose phosphotransferase" glucose-1-phosphate transferase' BiopythonParserWarning, Error: tblastn encountered an error: free(): invalid pointer

mingjuhao commented 3 years ago

I had the problem when I was using Kleborate --all

kelwyres commented 3 years ago

Hi,

Yes it is possible to edit the Kaptive databases to remove this parser warning (the databases are inside /kaptive/reference_database inside your Kleborate directory), however this warning won't cause Kaptive to fail.

Unfortunately, there is a known bug with tBLASTn that occurs sporadically and will cause Kaptive to fail- but usually doesn't replicate. I.e. when you rerun Kaptive (or Kleborate) it will work fine. This could be the problem here. It's a problem with BLAST+ rather than Kaptive. If you run Kaptive outside Kleborate you can get around the problem by using an older version of BLAST+ (see discussion here) but unfortunately Kleborate requires a newer version because it is using multiple different functions. We have tried to mitigate the problem by telling Kleborate to run Kaptive again if it fails, but occasionally Kaptive will fail twice on the same genome (we usually only encounter this when we are running Kleborate on 100s or 1000s of genomes).

I'm sorry, I realise this is not a very satisfying answer!

mingjuhao commented 3 years ago

Thanks for your prompt response! Yes, I am running Kleobrate on more than 400 genomes. The good thing is that I need not rerun it frequently. Maybe next time I can divide the genomes into small parts and concatenate the results together.