microbialman / CGATMetaSequencing

A metagenomics pipeline using the CGAT framework.
5 stars 4 forks source link

translateKraken2.py failing #1

Open handibles opened 4 years ago

handibles commented 4 years ago

first issue! :D

thanks for providing the script as per #71 in kraken2 - I've tried using this but can't seem to get it to roll out.

For Kraken version 2.0.8-beta, I'm calling:

(base) $ python2.7 ./translateKraken2.py \
   --krakenout ./5937_kraken_report \
   --translatedout ./5937_kraken_mpa \
   --taxdatadir $DB/kraken2_stnd/taxonomy

This runs through a string of IDs and then gets:

[ ... ]
20:31:44.654 [WARN] taxid 925 was merged into 76588
20:31:44.654 [WARN] taxid 3167 was deleted
20:31:44.654 [WARN] taxid 918 was deleted
20:31:44.654 [WARN] taxid 412 was merged into 410
Traceback (most recent call last):
  File "/home/user/translateKraken2.py", line 54, in <module>
    subprocess.run(["rm",tempfilename])
AttributeError: 'module' object has no attribute 'run'

, indicating that this should be a py3 thing. So try:

(base) $ python3.8 ./translateKraken2.py \
   --krakenout ./5937_kraken_report \
   --translatedout ./5937_kraken_mpa \
   --taxdatadir $DB/kraken2_stnd/taxonomy

but this gets an immediate

Traceback (most recent call last):
  File "/home/user/bin/translateKraken2.py", line 2, in <module>
    from argparse import ArgumentParser
  File "/home/user/miniconda3/envs/motus2/lib/python3.8/argparse.py", line 88, in <module>
    import re as _re
  File "/home/user/miniconda3/envs/motus2/lib/python3.8/re.py", line 143, in <module>
    class RegexFlag(enum.IntFlag):
AttributeError: module 'enum' has no attribute 'IntFlag'

enum34 is apparently no longer needed. I've tried removing enum34 but it wasn't installed. For giggles I then installed enum34 and it made no diff, so removed it again. I considered using a commit from around #71, but saw later commits were for larger datasets.

This is probably user error, but assistance appreciated!

microbialman commented 4 years ago

Hi sorry to be late to this. I don't think I had issue notifications on. The latest version of the script is best as the earlier ones will crash if you send too many taxids via shell.

This looks like a python install issue to me so not sure if I can help. As you say googling seems to point to conda envrionment issues with enum. I havent tested any of this repo past python 3.7. so could you try with an earlier version of python? Also happy to test my end if you are able to send the kraken file.

cazzlewazzle89 commented 4 years ago

Hi Matt,

I'm running in to an issue when running the translateKraken2.py script on outputs generated from a Kraken2 database based on the GTDB taxonomy. I have converted the taxonomy to nodes.dmp and names.dmp files using the script provided here but I am getting the following error when trying to translate the output to mpa format (below is the head and tail of the error, I've omitted the long list of tax ids).

[calum.walsh@compute06 Rerun]$ head err_kraken2gtdb
13:05:28.746 [ERRO] xopen: no content
13:05:33.961 [ERRO] xopen: no content
Traceback (most recent call last):
  File "/home/calum.walsh/translateKraken2.py", line 38, in <module>
    taxonkit = subprocess.check_output("echo '{}' | taxonkit lineage --data-dir {} | taxonkit reformat --data-dir {}".format("\n".join(uniqueids),args.tddir,args.tddir), shell=True)
  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'echo '26656
28344
28346

[calum.walsh@compute06 Rerun]$ tail err_kraken2gtdb
2498
34912
34109
34970
29593
22680
18691
16820
35675
33408' | taxonkit lineage --data-dir /data/databases/Kraken2_GTDB_r89_54k/TaxonKit/ | taxonkit reformat --data-dir /data/databases/Kraken2_GTDB_r89_54k/TaxonKit/' returned non-zero exit status 255

Do you have any insights into what might be going wrong? If so, it would be much appreciated.

All the best, Calum

microbialman commented 4 years ago

Hi Calum,

Looks like an older version of the script, that error could occur when the number of taxids was too large for the echo call. I updated the version in this repo a while back so that it now writes the taxids to a file and then calls them from there to get around this. If you pull the repo and try again that might help.

I have also just moved this whole pipeline to Snakemake as a new repo, so you may want to grab the script from that one instead as I will be working from there going forward.

Hope that works, Matt