Open handibles opened 4 years ago
Hi sorry to be late to this. I don't think I had issue notifications on. The latest version of the script is best as the earlier ones will crash if you send too many taxids via shell.
This looks like a python install issue to me so not sure if I can help. As you say googling seems to point to conda envrionment issues with enum. I havent tested any of this repo past python 3.7. so could you try with an earlier version of python? Also happy to test my end if you are able to send the kraken file.
Hi Matt,
I'm running in to an issue when running the translateKraken2.py script on outputs generated from a Kraken2 database based on the GTDB taxonomy. I have converted the taxonomy to nodes.dmp and names.dmp files using the script provided here but I am getting the following error when trying to translate the output to mpa format (below is the head and tail of the error, I've omitted the long list of tax ids).
[calum.walsh@compute06 Rerun]$ head err_kraken2gtdb
13:05:28.746 [ERRO] xopen: no content
13:05:33.961 [ERRO] xopen: no content
Traceback (most recent call last):
File "/home/calum.walsh/translateKraken2.py", line 38, in <module>
taxonkit = subprocess.check_output("echo '{}' | taxonkit lineage --data-dir {} | taxonkit reformat --data-dir {}".format("\n".join(uniqueids),args.tddir,args.tddir), shell=True)
File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'echo '26656
28344
28346
[calum.walsh@compute06 Rerun]$ tail err_kraken2gtdb
2498
34912
34109
34970
29593
22680
18691
16820
35675
33408' | taxonkit lineage --data-dir /data/databases/Kraken2_GTDB_r89_54k/TaxonKit/ | taxonkit reformat --data-dir /data/databases/Kraken2_GTDB_r89_54k/TaxonKit/' returned non-zero exit status 255
Do you have any insights into what might be going wrong? If so, it would be much appreciated.
All the best, Calum
Hi Calum,
Looks like an older version of the script, that error could occur when the number of taxids was too large for the echo call. I updated the version in this repo a while back so that it now writes the taxids to a file and then calls them from there to get around this. If you pull the repo and try again that might help.
I have also just moved this whole pipeline to Snakemake as a new repo, so you may want to grab the script from that one instead as I will be working from there going forward.
Hope that works, Matt
first issue! :D
thanks for providing the script as per #71 in kraken2 - I've tried using this but can't seem to get it to roll out.
For
Kraken version 2.0.8-beta
, I'm calling:This runs through a string of IDs and then gets:
, indicating that this should be a py3 thing. So try:
but this gets an immediate
enum34 is apparently no longer needed. I've tried removing
enum34
but it wasn't installed. For giggles I then installedenum34
and it made no diff, so removed it again. I considered using a commit from around #71, but saw later commits were for larger datasets.This is probably user error, but assistance appreciated!