timkahlke / BASTA

Basic Sequence Taxonomy Annotator
GNU General Public License v3.0
38 stars 13 forks source link

filter' object is not subscriptable #31

Closed Lcornet closed 2 years ago

Lcornet commented 2 years ago

I installed BASTA from the Conda package (python 3) but i am not able to setup the taxonomy.

taxdump.tar.gz.md5 100%[===============================================================>] 49 --.-KB/s in 0s

2022-02-15 15:38:52 (5.04 MB/s) - ‘/root/.basta/taxonomy/taxdump.tar.gz.md5’ saved [49]

[BASTA STATUS] Checking MD5 sum of file

Traceback (most recent call last): File "/opt/miniconda/envs/basta_py3/bin/basta", line 4, in import('pkg_resources').run_script('BASTA==1.4', 'basta') File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/pkg_resources/init.py", line 662, in run_script self.require(requires)[0].run_script(script_name, ns) File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/pkg_resources/init.py", line 1459, in run_script exec(code, namespace, namespace) File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/EGG-INFO/scripts/basta", line 118, in main.run_basta(args) File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/BastaMain.py", line 89, in run_basta self._basta_taxonomy(args) File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/BastaMain.py", line 186, in _basta_taxonomy dutils.down_and_check("ftp://ftp.ncbi.nih.gov/pub/taxonomy/","taxdump.tar.gz",args.directory) File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/DownloadUtils.py", line 60, in down_and_check while(check_md5(md5,out_dir)): File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/DownloadUtils.py", line 46, in check_md5 filehash.update(open(os.path.join(path,l[1])).read()) TypeError: 'filter' object is not subscriptable

Can you help tu understand the problem ?

timkahlke commented 2 years ago

Hi @Lcornet,

Thanks. I'll check in the complete fix in the coming hours.

timkahlke commented 2 years ago

Should all work. I changed the download method, i.e., there is an additional python library needed in the conda env. Just re-do the conda env creation and download as stated in the README.

Lcornet commented 2 years ago

Thanks, The install is complete and the error is solved. Nevertheless i am not able to download the taxonomy. I have this error :

[BASTA ERROR] MD5 sum mismatch. [BASTA STATUS] (Re-)Downloading file taxdump.tar.gz [BASTA STATUS] (Re-)Downloading file taxdump.tar.gz.md5

This never go further (grep mismatch in the log): [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch. [BASTA ERROR] MD5 sum mismatch.

Is it something related to this NCBI ?

timkahlke commented 2 years ago

Unfortunately, I can't reproduce your error so I don't think it's a BASTA problem. One thing to try is to delete any existing taxonomy folders that you might have already downloaded. That shouldn't affect it as I put some checks in but who knows? If you don't specify a particular directory with the -d option then the folder is in your home directory $HOME/.basta. Delete that one and try again.

If it still doesn't work I assume it's an internet connection problem. The error means that the downloaded file is corrupt or not complete and so it's trying again. As said, it works for me with a fresh install.

Lcornet commented 2 years ago

I delete the home/.basta directory and provide a directory option to basta, but i have the same error.

I will try on another cluster.

timkahlke commented 2 years ago

Sorry to hear that it still doesn't work. I hope it will work out with another server. Good luck

davised commented 2 years ago

I'm getting the same error. There is no issue with the download.

# [BASTA STATUS] Checking MD5 sum of file

Traceback (most recent call last):
  File "/local/cluster/BASTA-1.4.1/bin/basta", line 4, in <module>
    __import__('pkg_resources').run_script('BASTA==1.4', 'basta')
  File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/pkg_resources/__init__.py", line 672, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1472, in run_script
    exec(code, namespace, namespace)
  File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/EGG-INFO/scripts/basta", line 118, in <module>
    main.run_basta(args)
  File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/BastaMain.py", line 89, in run_basta
    self._basta_taxonomy(args)
  File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/BastaMain.py", line 186, in _basta_taxonomy
    dutils.down_and_check("ftp://ftp.ncbi.nih.gov/pub/taxonomy/","taxdump.tar.gz",args.directory)
  File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/DownloadUtils.py", line 60, in down_and_check
    while(check_md5(md5,out_dir)):
  File "/local/cluster/BASTA-1.4.1/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/DownloadUtils.py", line 46, in check_md5
    filehash.update(open(os.path.join(path,l[1])).read())
TypeError: 'filter' object is not subscriptable

Just to confirm there is no issue with the download:

$ ls
taxdump.tar.gz  taxdump.tar.gz.md5
$ cat taxdump.tar.gz.md5
1c84cb5d87fddf5007b8f2a6cc186825  taxdump.tar.gz

I took a look at the code, and I can't figure out why you are running filter(None, fl.split()) in the DownloadUtils.py:

 40 # Check MD5 sum of givenfile
 41 def check_md5(f,path):
 42     with open(os.path.join(path,f)) as f:
 43     ┆   fl = f.readline()
 44     ┆   l = filter(None,fl.split())
 45     ┆   filehash = hashlib.md5()
 46     ┆   filehash.update(open(os.path.join(path,l[1])).read())
 47     ┆   if str(filehash.hexdigest()) != str(l[0]):
 48     ┆   ┆   return 1
 49     ┆   else:
 50     ┆   ┆   return 0

I made a little test script that does the same thing and I can recreate the error:

$ cat test.py
import os
import hashlib

def check_md5(path, fn):
    with open(fn, "rt") as fh:
        fl = fh.readline()
        l = filter(None,fl.split())
        print(l)
        filehash = hashlib.md5()
        filehash.update(open(os.path.join(path,l[1])).read())
        if str(filehash.hexdigest()) != str(l[0]):
            return 1
        else:
            return 0

if __name__ == "__main__":
    check_md5(".", "taxdump.tar.gz.md5")

I edited the code a bit and this seems to work properly:

import os
import hashlib

def check_md5(path, fn):
    with open(fn, "rt") as fh:
        fl = fh.readline()
        l = fl.split()
        filehash = hashlib.md5()
        filehash.update(open(os.path.join(path,l[1]), "rb").read())
        if str(filehash.hexdigest()) != str(l[0]):
            return 1
        else:
            return 0

if __name__ == "__main__":
    print(check_md5(".", "taxdump.tar.gz.md5"))
tkahlke commented 2 years ago

I thought I had taken out all the "filter" ones. I have no clue why I used it in the first place (it's pretty old after all) and removed it when porting it to python3 ... I thought. I'll remove that one now and check the rest of the code again.

Thanks for pointing it out.

davised commented 2 years ago

Sure thing. You may have noticed but I had to open the file in 'rb' mode as well.

I hit some other errors as well. I'll open new issues for those when I have time.