Closed Lcornet closed 2 years ago
Hi @Lcornet,
Thanks. I'll check in the complete fix in the coming hours.
Should all work. I changed the download method, i.e., there is an additional python library needed in the conda env. Just re-do the conda env creation and download as stated in the README.
Thanks, The install is complete and the error is solved. Nevertheless i am not able to download the taxonomy. I have this error :
[BASTA ERROR] MD5 sum mismatch. [BASTA STATUS] (Re-)Downloading file taxdump.tar.gz [BASTA STATUS] (Re-)Downloading file taxdump.tar.gz.md5
This never go further (grep mismatch in the log): [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch.
Is it something related to this NCBI ?
Unfortunately, I can't reproduce your error so I don't think it's a BASTA problem. One thing to try is to delete any existing taxonomy folders that you might have already downloaded. That shouldn't affect it as I put some checks in but who knows? If you don't specify a particular directory with the -d
option then the folder is in your home directory $HOME/.basta
. Delete that one and try again.
If it still doesn't work I assume it's an internet connection problem. The error means that the downloaded file is corrupt or not complete and so it's trying again. As said, it works for me with a fresh install.
I delete the home/.basta directory and provide a directory option to basta, but i have the same error.
I will try on another cluster.
Sorry to hear that it still doesn't work. I hope it will work out with another server. Good luck
I'm getting the same error. There is no issue with the download.
# [BASTA STATUS] Checking MD5 sum of file
Traceback (most recent call last):
File "/local/cluster/BASTA-1.4.1/bin/basta", line 4, in <module>
__import__('pkg_resources').run_script('BASTA==1.4', 'basta')
File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/pkg_resources/__init__.py", line 672, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1472, in run_script
exec(code, namespace, namespace)
File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/EGG-INFO/scripts/basta", line 118, in <module>
main.run_basta(args)
File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/BastaMain.py", line 89, in run_basta
self._basta_taxonomy(args)
File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/BastaMain.py", line 186, in _basta_taxonomy
dutils.down_and_check("ftp://ftp.ncbi.nih.gov/pub/taxonomy/","taxdump.tar.gz",args.directory)
File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/DownloadUtils.py", line 60, in down_and_check
while(check_md5(md5,out_dir)):
File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/DownloadUtils.py", line 46, in check_md5
filehash.update(open(os.path.join(path,l[1])).read())
TypeError: 'filter' object is not subscriptable
Just to confirm there is no issue with the download:
$ ls
taxdump.tar.gz taxdump.tar.gz.md5
$ cat taxdump.tar.gz.md5
1c84cb5d87fddf5007b8f2a6cc186825 taxdump.tar.gz
I took a look at the code, and I can't figure out why you are running filter(None, fl.split())
in the DownloadUtils.py:
40 # Check MD5 sum of givenfile
41 def check_md5(f,path):
42 with open(os.path.join(path,f)) as f:
43 ┆ fl = f.readline()
44 ┆ l = filter(None,fl.split())
45 ┆ filehash = hashlib.md5()
46 ┆ filehash.update(open(os.path.join(path,l[1])).read())
47 ┆ if str(filehash.hexdigest()) != str(l[0]):
48 ┆ ┆ return 1
49 ┆ else:
50 ┆ ┆ return 0
I made a little test script that does the same thing and I can recreate the error:
$ cat test.py
import os
import hashlib
def check_md5(path, fn):
with open(fn, "rt") as fh:
fl = fh.readline()
l = filter(None,fl.split())
print(l)
filehash = hashlib.md5()
filehash.update(open(os.path.join(path,l[1])).read())
if str(filehash.hexdigest()) != str(l[0]):
return 1
else:
return 0
if __name__ == "__main__":
check_md5(".", "taxdump.tar.gz.md5")
I edited the code a bit and this seems to work properly:
import os
import hashlib
def check_md5(path, fn):
with open(fn, "rt") as fh:
fl = fh.readline()
l = fl.split()
filehash = hashlib.md5()
filehash.update(open(os.path.join(path,l[1]), "rb").read())
if str(filehash.hexdigest()) != str(l[0]):
return 1
else:
return 0
if __name__ == "__main__":
print(check_md5(".", "taxdump.tar.gz.md5"))
I thought I had taken out all the "filter" ones. I have no clue why I used it in the first place (it's pretty old after all) and removed it when porting it to python3 ... I thought. I'll remove that one now and check the rest of the code again.
Thanks for pointing it out.
Sure thing. You may have noticed but I had to open the file in 'rb' mode as well.
I hit some other errors as well. I'll open new issues for those when I have time.
I installed BASTA from the Conda package (python 3) but i am not able to setup the taxonomy.
taxdump.tar.gz.md5 100%[===============================================================>] 49 --.-KB/s in 0s
2022-02-15 15:38:52 (5.04 MB/s) - ‘/root/.basta/taxonomy/taxdump.tar.gz.md5’ saved [49]
[BASTA STATUS] Checking MD5 sum of file
Traceback (most recent call last): File "/opt/miniconda/envs/basta_py3/bin/basta", line 4, in
import('pkg_resources').run_script('BASTA==1.4', 'basta')
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/pkg_resources/init.py", line 662, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/pkg_resources/init.py", line 1459, in run_script
exec(code, namespace, namespace)
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/EGG-INFO/scripts/basta", line 118, in
main.run_basta(args)
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/BastaMain.py", line 89, in run_basta
self._basta_taxonomy(args)
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/BastaMain.py", line 186, in _basta_taxonomy
dutils.down_and_check("ftp://ftp.ncbi.nih.gov/pub/taxonomy/","taxdump.tar.gz",args.directory)
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/DownloadUtils.py", line 60, in down_and_check
while(check_md5(md5,out_dir)):
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/DownloadUtils.py", line 46, in check_md5
filehash.update(open(os.path.join(path,l[1])).read())
TypeError: 'filter' object is not subscriptable
Can you help tu understand the problem ?