Closed dportik closed 2 years ago
Did you actually install the package via pip
, or are you just running the script from the ./bin/
directory? The error suggests that you didn't install the package ("gtdb2td"), and so gtdb2td
cannot be found.
Yes it was installed with pip.
$ pip3 show gtdb_to_taxdump
Name: gtdb-to-taxdump
Version: 0.1.7
Summary: GTDB database utility scripts
Home-page: https://github.com/nick-youngblut/gtdb_to_taxdump
Author: Nick Youngblut
Author-email: nyoungb2@gmail.com
License: MIT license
Location: /usr/local/lib/python3.7/site-packages
Requires: networkx
Required-by:
I also just tried removing the pip install and replacing with a local install with setup.py
, same error:
2022-01-12 10:42:02,052 - Loading file: /Users/dportik/Documents/Projects/Proj-Zymo-TruMatrix/3-MAGs/GTDB-to-NCBI/taxdump/names.dmp
Traceback (most recent call last):
File "/usr/local/bin/ncbi-gtdb_map.py", line 4, in <module>
__import__('pkg_resources').run_script('gtdb-to-taxdump==0.1.7', 'ncbi-gtdb_map.py')
File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1471, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.7/site-packages/gtdb_to_taxdump-0.1.7-py3.7.egg/EGG-INFO/scripts/ncbi-gtdb_map.py", line 628, in <module>
File "/usr/local/lib/python3.7/site-packages/gtdb_to_taxdump-0.1.7-py3.7.egg/EGG-INFO/scripts/ncbi-gtdb_map.py", line 599, in main
File "/usr/local/lib/python3.7/site-packages/gtdb_to_taxdump-0.1.7-py3.7.egg/gtdb2td/Dmp.py", line 70, in load_dmp
NameError: name 'gtdb2td' is not defined
Looks like the issue is in Dmp.py
.
Do any of the other scripts work?
This script works as long as I do not use the --names-dmp
and --nodes-dmp
flags.
Adding a simple import statement in Dmp.py
and re-installing locally fixed that issue, but I hit another error soon after:
2022-01-12 10:50:20,479 - Loading file: /Users/dportik/Documents/Projects/Proj-Zymo-TruMatrix/3-MAGs/GTDB-to-NCBI/taxdump/names.dmp
Traceback (most recent call last):
File "/usr/local/bin/ncbi-gtdb_map.py", line 4, in <module>
__import__('pkg_resources').run_script('gtdb-to-taxdump==0.1.7', 'ncbi-gtdb_map.py')
File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1471, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.7/site-packages/gtdb_to_taxdump-0.1.7-py3.7.egg/EGG-INFO/scripts/ncbi-gtdb_map.py", line 628, in <module>
File "/usr/local/lib/python3.7/site-packages/gtdb_to_taxdump-0.1.7-py3.7.egg/EGG-INFO/scripts/ncbi-gtdb_map.py", line 599, in main
File "/usr/local/lib/python3.7/site-packages/gtdb_to_taxdump-0.1.7-py3.7.egg/gtdb2td/Dmp.py", line 76, in load_dmp
TypeError: cannot use a string pattern on a bytes-like object
After a bit of searching, I found I had to add .decode('utf-8')
to all lines splitting the line with regex in Dmp.py
. I was able to get it to finish successfully after. I'll open a pull request so you can see the relevant changes.
Thanks for making these useful tools! I have been looking for a quick way to compare NCBI names to GTDB names and
ncbi-gtdb_map.py
is great for this use-case.I first tried converting species names from NCBI to GTDB, and it ran successfully. I noticed quite a few NCBI species were not assigned a GTDB name.
I am now trying to use the NCBI taxids to see if there is any difference. However, it looks like I've hit a bug when invoking the
--names-dmp
and--nodes-dmp
flags. I've run:The error is pasted below:
Any idea what might be happening here?
Thanks!