nick-youngblut / gtdb_to_taxdump

Convert GTDB taxonomy to NCBI taxdump format
MIT License
66 stars 13 forks source link

AttributeError: module 'gtdb_to_taxdump' has no attribute 'Dmp' #16

Closed Sidduppal closed 2 years ago

Sidduppal commented 2 years ago

Hey, I'm try to run gtdb_to_diamond.py script but getting the following error.

Traceback (most recent call last):
  File "/home/sidd/miniconda3/envs/autometa_aims/bin/gtdb_to_diamond.py", line 142, in <module>
    main(args)
  File "/home/sidd/miniconda3/envs/autometa_aims/bin/gtdb_to_diamond.py", line 125, in main
    gtdb2td.Dmp.copy_nodes(args.nodes_dmp, args.outdir)
AttributeError: module 'gtdb_to_taxdump' has no attribute 'Dmp'

I'm getting the same error no matter if I installed using pip or GitHub. I'm successfully able to run the main gtdb_to_taxdump.py without any errors. This is similar to the issue#14 reported by another user - link

Maybe the setup.py is messing up the python path? Any help is appreciated 😄

nick-youngblut commented 2 years ago

@Sidduppal the CI tests are passing, suggesting that the package is setup correctly. Maybe something is weird is going on with your PYTHONPATH? Could you provide more info about your compute environment (e.g., OS, pip version, python version, conda env, etc?)

Sidduppal commented 2 years ago

H ey @nick-youngblut, I'm using Ubintu 20.04.4 LTS, Python 3.9, pip version pip 22.1.2. I have attached my conda env info below. gtdb_to_diamond_env

gtdb_to_diamond.py --version gives me 0.1.8dev. As mentioned above I'm able to run gtdb_to_taxdump.py without any errors but getting stuck with gtdb_to_diamond.py.

Thanks for your help 😄

nick-youngblut commented 2 years ago

I just added py3.9 to the CI, and the tests are still passing. Maybe there is something going on with your PYTHONPATH? What about installs of other python packages; are you able to use them without issues?

Sidduppal commented 2 years ago

Hey @nick-youngblut, yes, I'm able to run all python packages without any issue at all. I could be wrong but these could be the two reasons gtdb_to_diamond.py is failing while the CI build is passing:

  1. gtdb_to_diamond.py is being tested only for the help text, ie. running gtdb_to_diamond.py -h but not with the actual input ie. gtdb_to_diamond.py -o $OUTDIR gtdb_proteins_aa_reps_r202.tar.gz taxdump/names.dmp taxdump/nodes.dmp
  2. It seems like when the CI is running gtdb_to_diamond.py there is NO stderr for it, potentially indicating that it's not being run (link). In contrast most of the other scripts have a sterr
Sidduppal commented 2 years ago

Hey @nick-youngblut, while debugging the above error I realised that I did not use a prot2acc table while building the diamond database. I was running acc2gtdb_tax.py but after some processing, I'm getting a different error.

/release207_v2/fastani/database/GCF/016/827/605/GCF_016827605.1_genomic.fna.gz
/release207_v2/fastani/database/GCF/016/008/875/GCF_016008875.1_genomic.fna.gz
/release207_v2/fastani/database/GCF/016/862/815/GCF_016862815.1_genomic.fna.gz
/release207_v2/fastani/database/GCF/016/862/955/GCF_016862955.1_genomic.fna.gz
/fastani/database/GCF/016/862/095/GCF_016862095.1_genomic.fna.gz
/fastani/database/GCF/016/862/635/GCF_016862635.1_genomic.fna.gz
/release207_v2/fastani/database/GCF/016/725/325/GCF_016725325.1_genomic.fna.gz
/release207_v2/fastani/database/GCF/016/464/385/GCF_016464385.1_genomic.fna.gz
Traceback (most recent call last):
  File "/home/sidd/miniconda3/envs/autometa_aims/bin/acc2gtdb_tax.py", line 140, in <module>
    main(args)
  File "/home/sidd/miniconda3/envs/autometa_aims/bin/acc2gtdb_tax.py", line 131, in main
    thread_map(acc2tax_partial, gtdb_genomes, chunksize=1, max_workers=args.threads)
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 94, in thread_map
    return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
    yield fs.pop().result()
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/sidd/miniconda3/envs/autometa_aims/bin/acc2gtdb_tax.py", line 94, in seq_acc2tax
    acc_prefix = acc_code[splitpath[-4]]
KeyError: '012'

I'm using the latest gtdb database (release207_v2) with the following commands: acc2gtdb_tax.py release207_v2/fastani/database gtdb_to_taxdump/names.dmp --threads 20 --outfile gtdb_to_taxdump/gtdb.acc2tax

nick-youngblut commented 2 years ago

I was running acc2gtdb_tax.py but after some processing, I'm getting a different error.

Can you please move this to a new issue?

nick-youngblut commented 2 years ago

The gtdb_to_diamond.py import error should now be fixed. Thanks for catching this bug!

Please re-open this issue if you still have problems

Sidduppal commented 2 years ago

Hey @nick-youngblut, it seems the bug is fixed when I install the repo from source. However, I'm still encountering the bug whenever I install the package using pip. It looks like the pip package is not updated yet.

P.S. I'm unable to reopen the issue since you closed them.

nick-youngblut commented 2 years ago

Thanks @Sidduppal for the reminder to update the pypi release! That should now be complete: https://github.com/nick-youngblut/gtdb_to_taxdump/actions/runs/2730387686