zyxue / ncbitax2lin

🐞 Convert NCBI taxonomy dump into lineages
MIT License
140 stars 29 forks source link

Can't seem to get this running on my mac #3

Closed hepcat72 closed 6 years ago

hepcat72 commented 6 years ago

Everything seems to be fine through conda activate venv (though I had to switch to bash to get this to work, as it doesn't work in tcsh - perhaps this should be mentioned in the readme...). Anyway, when I run the next command make, I get the error:

usage: md5sum [-bv] [-c [file]] | [file...]
Generates or checks MD5 Message Digests
    -c  check message digests (default is generate)
    -v  verbose, print file names when checking
    -b  read files in binary mode
The input for -c should be the list of message digests and file names
that is printed on stdout by this program when it generates digests.
make[1]: *** [taxdump.tar.gz] Error 2
make: *** [taxdump] Error 2

So it looks like the call to md5sum is for some other version?

And if I try to run the script, I get:

./ncbitax2lin.py 
./ncbitax2lin.py: line 1: import: command not found
./ncbitax2lin.py: line 2: import: command not found
./ncbitax2lin.py: line 3: import: command not found
./ncbitax2lin.py: line 4: import: command not found
./ncbitax2lin.py: line 5: import: command not found
./ncbitax2lin.py: line 6: import: command not found
./ncbitax2lin.py: line 8: import: command not found
 from: can't read /var/mail/utils
./ncbitax2lin.py: line 13: syntax error near unexpected token `newline'
./ncbitax2lin.py: line 13: `logging.basicConfig('

I'm not very experienced with python. Is this a problem on my end or is there an incompatibility issue or what?

zyxue commented 6 years ago

seems to be a md5sum problem. Could you do the following and post the output, please?

  1. md5sum --version
  2. make -n, this shows all the commands to be run
hepcat72 commented 6 years ago

I can't seem to get a version from md5sum:

md5sum --version
md5sum: illegal option -- -
usage: md5sum [-bv] [-c [file]] | [file...]
Generates or checks MD5 Message Digests
    -c  check message digests (default is generate)
    -v  verbose, print file names when checking
    -b  read files in binary mode
The input for -c should be the list of message digests and file names
that is printed on stdout by this program when it generates digests.

I tried other options and also did a man - nothing. I know it's pretty old though since it's in /sw/bin. I removed that from my path, but apparently, that's my only copy. I'd surprised xcode didn't install one.

make -n output:

/Applications/Xcode.app/Contents/Developer/usr/bin/make -C taxdump all
wget -N \
        ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz \
        ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz.md5 \
    && md5sum -c taxdump.tar.gz.md5
wget -N \
     ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_readme.txt
rm -rfv taxdump && mkdir -p taxdump && tar zxf taxdump.tar.gz -C ./taxdump
python ncbitax2lin.py \
        --nodes-file taxdump/taxdump/nodes.dmp \
        --names-file taxdump/taxdump/names.dmp \
        -o "lineages-2018-03-12"
md5sum -b ""lineages-2018-03-12".csv.gz" > ""lineages-2018-03-12".csv.gz".md5
zyxue commented 6 years ago

This is my md5sum version

md5sum --version
md5sum (GNU coreutils) 8.22
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Ulrich Drepper, Scott Miller, and David Madore.

Could you check type -a md5sum to see if there is another version.

If not, you'd better need to update your md5sum (e.g. with Homebrew, run brew install coreutils).

If you have problem updating md5sum, another option is to execute all the commands from make -n one by one manually.

hepcat72 commented 6 years ago

Well, I updated homebrew and did brew install coreutils, but it did not appear to install an md5sum utility. I googled it and mac's md5 command seems to be how its done on macOS? md5 -r file is supposed to produce the same result as md5sum.

zyxue commented 6 years ago

See e.g.

type -a md5sum
md5sum is /usr/local/Cellar/coreutils/8.23_1/libexec/gnubin/md5sum

Make sure gnubin is in your PATH,

export PATH=/usr/local/Cellar/coreutils/<your_version>/libexec/gnubin:${PATH}

Then try again

zyxue commented 6 years ago

md5 seems to be mac version of md5sum, they both use md5 hash values, so understandably they could produce the same md5 hash. However, obviously they have different API. md5 hash is used to validate that the files downloaded haven't been tampered. It doesn't affect the generation of lineages.

hepcat72 commented 6 years ago

OK. I added that to my path. I got past that issue, but now make dies with this error:

python ncbitax2lin.py \
        --nodes-file taxdump/taxdump/nodes.dmp \
        --names-file taxdump/taxdump/names.dmp \
        -o "lineages-2018-03-12"
Traceback (most recent call last):
  File "/Users/rleach/Temporary/ncbitax2lin-master/venv/lib/python2.7/site.py", line 703, in <module>
    main()
  File "/Users/rleach/Temporary/ncbitax2lin-master/venv/lib/python2.7/site.py", line 670, in main
    virtual_install_main_packages()
  File "/Users/rleach/Temporary/ncbitax2lin-master/venv/lib/python2.7/site.py", line 553, in virtual_install_main_packages
    f = open(os.path.join(os.path.dirname(__file__), 'orig-prefix.txt'))
IOError: [Errno 2] No such file or directory: '/Users/rleach/Temporary/ncbitax2lin-master/venv/lib/python2.7/orig-prefix.txt'
make: *** [""lineages-2018-03-12".csv.gz"] Error 1
zyxue commented 6 years ago

Did you create a virtual environment as instructed on the README page?

BTW, I generated a latest version for you, https://gitlab.com/zyxue/ncbitax2lin-lineages/blob/master/lineages-2018-03-12.csv.gz

hepcat72 commented 6 years ago

Cool. Thanks. And yeah. Here's what I did:

conda create -n pandasenv python
source activate pandasenv
conda install pandas
cd ncbitax2lin-master/
conda create -y -p venv/ --file env-conda.txt
conda activate /Users/rleach/Temporary/ncbitax2lin-master/venv
make

Just re-tried it. I still get the same error. Is any of that incorrect?

zyxue commented 6 years ago

No, that doesn't seem to be correct. You created two nested virtual environments... And you weren't activating venv correctly.

Try in a new terminal session and just follow the instruction, see if that works

hepcat72 commented 6 years ago

Right, well. I didn't know the steps for installing pandas. The link wasn't explicit about it. And the command to activate venv hadn't worked. The one I executed was the one that was suggested at the end of the conda create command. I'll try a fresh terminal and let you know.

On Mar 12, 2018, at 5:07 PM, Zhuyi Xue notifications@github.com wrote:

No, that doesn't seem to be correct. You created two nested virtual environments... And you weren't activating venv correctly.

Try in a new terminal session and just follow the instruction, see if that works

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zyxue/ncbitax2lin/issues/3#issuecomment-372462851, or mute the thread https://github.com/notifications/unsubscribe-auth/ACMadOTmMi2AjsjCTID37q_9S0NfC8MSks5tduOVgaJpZM4SnEG4.

zyxue commented 6 years ago

pandas is part of the dependency. It's in env-conda.txt together with all other dependencies and the dependencies of pandas itself. So

conda create -y -p venv/ --file env-conda.txt

installs everything you need when creating a new virtual environment. It might be beneficial to read a bit about conda https://conda.io/docs/ for long-term purpose.

hepcat72 commented 6 years ago

I still get the same result. Here's the full session. Did I miss something?:

Last login: Mon Mar 12 15:52:09 on ttys011
gen-rlimac[Mar 12 17:11:59]:~>mkdir testdir
gen-rlimac[Mar 12 17:12:28]:~>cd testdir
gen-rlimac[Mar 12 17:12:32]:~/testdir>bash
bash-3.2$ git clone git@github.com:zyxue/ncbitax2lin.git
Cloning into 'ncbitax2lin'...
Warning: Permanently added the RSA host key for IP address '192.30.253.112' to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
bash-3.2$ git clone https://github.com/zyxue/ncbitax2lin.git
Cloning into 'ncbitax2lin'...
remote: Counting objects: 177, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 177 (delta 0), reused 1 (delta 0), pack-reused 174
Receiving objects: 100% (177/177), 262.03 KiB | 5.24 MiB/s, done.
Resolving deltas: 100% (83/83), done.
bash-3.2$ ls
ncbitax2lin
bash-3.2$ cd ncbitax2lin/
bash-3.2$ conda create -y -p venv/ --file env-conda.txt
Solving environment: done

==> WARNING: A newer version of conda exists. <==
  current version: 4.4.10
  latest version: 4.4.11

Please update conda by running

    $ conda update -n base conda

## Package Plan ##

  environment location: /Users/rleach/testdir/ncbitax2lin/venv

  added / updated specs: 
    - mkl==2017.0.1=0
    - numpy==1.12.0=py27_0
    - openssl==1.0.2k=0
    - pandas==0.19.2=np112py27_1
    - pip==9.0.1=py27_1
    - python-dateutil==2.6.0=py27_0
    - python==2.7.13=0
    - pytz==2016.10=py27_0
    - readline==6.2=2
    - setuptools==27.2.0=py27_0
    - six==1.10.0=py27_0
    - sqlite==3.13.0=0
    - tk==8.5.18=0
    - wheel==0.29.0=py27_0
    - zlib==1.2.8=3

The following NEW packages will be INSTALLED:

    ca-certificates: 2018.1.18-0        conda-forge
    certifi:         2018.1.18-py27_0   conda-forge
    mkl:             2017.0.1-0                    
    numpy:           1.12.0-py27_0                 
    openssl:         1.0.2k-0           conda-forge
    pandas:          0.19.2-np112py27_1 conda-forge
    pip:             9.0.1-py27_1       conda-forge
    python:          2.7.13-0           conda-forge
    python-dateutil: 2.6.0-py27_0       conda-forge
    pytz:            2016.10-py27_0     conda-forge
    readline:        6.2-2                         
    setuptools:      27.2.0-py27_0      conda-forge
    six:             1.10.0-py27_0      conda-forge
    sqlite:          3.13.0-0           conda-forge
    tk:              8.5.18-0                      
    wheel:           0.29.0-py27_0      conda-forge
    zlib:            1.2.8-3            conda-forge

Preparing transaction: done
Verifying transaction: - 
SafetyError: The package for python located at /usr/local/miniconda3/pkgs/python-2.7.13-0
appears to be corrupted. The path 'lib/python2.7/site.py'
has a sha256 mismatch.
  reported sha256: c5b9583637068853681954f06cf9ccd29a0442bdfd04e8f0319ec3d4343138b6
  actual sha256: ea6142c091a5f2b747c12df5c4b8c6df382f49e3041e2f58a0b22134b371047b

done
Executing transaction: done
#
# To activate this environment, use:
# > source activate /Users/rleach/testdir/ncbitax2lin/venv
#
# To deactivate an active environment, use:
# > source deactivate
#

bash-3.2$ source activate venv/
(/Users/rleach/testdir/ncbitax2lin/venv) bash-3.2$ make
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C taxdump all
wget -N \
        ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz \
        ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz.md5 \
    && md5sum -c taxdump.tar.gz.md5
--2018-03-12 17:15:12--  ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
           => ‘.listing’
Resolving ftp.ncbi.nlm.nih.gov... 130.14.250.13, 2607:f220:41e:250::10
Connecting to ftp.ncbi.nlm.nih.gov|130.14.250.13|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/taxonomy ... done.
==> PASV ... done.    ==> LIST ... done.

    [ <=>                                                                                                                                      ] 3,020       --.-K/s   in 0.08s   

2018-03-12 17:15:12 (35.5 KB/s) - ‘.listing’ saved [3020]

Removed ‘.listing’.
--2018-03-12 17:15:12--  ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
           => ‘taxdump.tar.gz’
==> CWD not required.
==> PASV ... done.    ==> RETR taxdump.tar.gz ... done.
Length: 42075619 (40M)

100%[=========================================================================================================================================>] 42,075,619  23.7MB/s   in 1.7s   

2018-03-12 17:15:14 (23.7 MB/s) - ‘taxdump.tar.gz’ saved [42075619]

--2018-03-12 17:15:14--  ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz.md5
           => ‘.listing’
Connecting to ftp.ncbi.nlm.nih.gov|130.14.250.13|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/taxonomy ... done.
==> PASV ... done.    ==> LIST ... done.

    [ <=>                                                                                                                                      ] 3,020       --.-K/s   in 0.002s  

2018-03-12 17:15:14 (1.24 MB/s) - ‘.listing’ saved [3020]

Removed ‘.listing’.
--2018-03-12 17:15:14--  ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz.md5
           => ‘taxdump.tar.gz.md5’
==> CWD not required.
==> PASV ... done.    ==> RETR taxdump.tar.gz.md5 ... done.
Length: 49

100%[=========================================================================================================================================>] 49          --.-K/s   in 0s      

2018-03-12 17:15:14 (840 KB/s) - ‘taxdump.tar.gz.md5’ saved [49]

FINISHED --2018-03-12 17:15:14--
Total wall clock time: 2.3s
Downloaded: 2 files, 40M in 1.8s (22.6 MB/s)
taxdump.tar.gz: OK
wget -N \
     ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_readme.txt
--2018-03-12 17:15:14--  ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_readme.txt
           => ‘.listing’
Resolving ftp.ncbi.nlm.nih.gov... 130.14.250.13, 2607:f220:41e:250::10
Connecting to ftp.ncbi.nlm.nih.gov|130.14.250.13|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/taxonomy ... done.
==> PASV ... done.    ==> LIST ... done.

    [ <=>                                                                                                                                      ] 3,020       --.-K/s   in 0.005s  

2018-03-12 17:15:14 (551 KB/s) - ‘.listing’ saved [3020]

Removed ‘.listing’.
--2018-03-12 17:15:14--  ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_readme.txt
           => ‘taxdump_readme.txt’
==> CWD not required.
==> PASV ... done.    ==> RETR taxdump_readme.txt ... done.
Length: 4958 (4.8K)

100%[=========================================================================================================================================>] 4,958       --.-K/s   in 0.02s   

2018-03-12 17:15:14 (316 KB/s) - ‘taxdump_readme.txt’ saved [4958]

rm -rfv taxdump && mkdir -p taxdump && tar zxf taxdump.tar.gz -C ./taxdump
python ncbitax2lin.py \
        --nodes-file taxdump/taxdump/nodes.dmp \
        --names-file taxdump/taxdump/names.dmp \
        -o "lineages-2018-03-12"
Traceback (most recent call last):
  File "/Users/rleach/testdir/ncbitax2lin/venv/lib/python2.7/site.py", line 703, in <module>
    main()
  File "/Users/rleach/testdir/ncbitax2lin/venv/lib/python2.7/site.py", line 670, in main
    virtual_install_main_packages()
  File "/Users/rleach/testdir/ncbitax2lin/venv/lib/python2.7/site.py", line 553, in virtual_install_main_packages
    f = open(os.path.join(os.path.dirname(__file__), 'orig-prefix.txt'))
IOError: [Errno 2] No such file or directory: '/Users/rleach/testdir/ncbitax2lin/venv/lib/python2.7/orig-prefix.txt'
make: *** [""lineages-2018-03-12".csv.gz"] Error 1
(/Users/rleach/testdir/ncbitax2lin/venv) bash-3.2$ 
zyxue commented 6 years ago

You python seesm to be corrupted...

Verifying transaction: - 
SafetyError: The package for python located at /usr/local/miniconda3/pkgs/python-2.7.13-0
appears to be corrupted. The path 'lib/python2.7/site.py'
has a sha256 mismatch.
  reported sha256: c5b9583637068853681954f06cf9ccd29a0442bdfd04e8f0319ec3d4343138b6
  actual sha256: ea6142c091a5f2b747c12df5c4b8c6df382f49e3041e2f58a0b22134b371047b

How did you install miniconda?

It's a python problem, not ncbi2taxlin .

hepcat72 commented 6 years ago

Oh geez. I don't know. It's been too long ago. OK, well never mind. This rabbit hole is getting too deep. Thanks for the help.

zyxue commented 6 years ago

Could be a conda problem, https://github.com/conda/conda/issues/5811. Here are a few options if you still want to make it work.

  1. update conda with conda update conda as latest version is 4.4.11
  2. try ignore safety check as suggested in the above link, conda config --set skip_safety_checks true
  3. downgrade conda. e.g. I am using conda 4.3.30
hepcat72 commented 6 years ago

Well, I tried the first 2 and got errors. I'm just going to drop this I guess. Not your fault. Just way more effort than I intended to put in.

(/Users/rleach/testdir/ncbitax2lin/venv) bash-3.2$ conda update -n base

CondaEnvironmentNotFoundError: Could not find environment: base .
You can list all discoverable environments with `conda info --envs`.

usage: conda [-h] [-V] command ...
conda: error: argument command: invalid choice: '/usr/local/miniconda3/bin/conda' (choose from 'info', 'help', 'list', 'search', 'create', 'install', 'update', 'upgrade', 'remove', 'uninstall', 'config', 'clean', 'package')
(/Users/rleach/testdir/ncbitax2lin/venv) bash-3.2$ conda update conda

PackageNotInstalledError: Package is not installed in prefix.
  prefix: /Users/rleach/testdir/ncbitax2lin/venv
  package name: conda

usage: conda [-h] [-V] command ...
conda: error: argument command: invalid choice: '/usr/local/miniconda3/bin/conda' (choose from 'info', 'help', 'list', 'search', 'create', 'install', 'update', 'upgrade', 'remove', 'uninstall', 'config', 'clean', 'package')
(/Users/rleach/testdir/ncbitax2lin/venv) bash-3.2$ conda config --set skip_safety_checks true

CondaValueError: Key 'skip_safety_checks' is not a known primitive parameter.
zyxue commented 6 years ago

Conda is great, it will be worthwhile if you figured it out. But it's your call.

As a kind suggestion, I meant

# deactivate the virtual env you are in currently
source deactivate <whatever_virtual_environment_you_are_currently_in>
conda update update

You'll see something like such

conda update conda
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /path/to/miniconda3:

The following packages will be UPDATED:

    conda:   4.3.30-py35hd530ce9_0 --> 4.4.11-py35_0       
    pycosat: 0.6.1-py35_0          --> 0.6.3-py35h745f8c1_0

NOT conda update -n base, I don't know why you did this.

zyxue commented 6 years ago

Another way is to go through a more comprehensive tutorial first, and then try this. I thought I've made the procedure straightforward.

hepcat72 commented 6 years ago

Oh yeah. Whoops. I didn't copy the history back far enough. I tried conda update conda and got an error. The conda update -n base was a suggestion from earlier conda output. I closed the window though, so the other error is gone. And yeah, I've heard conda is good. I just haven't picked it up yet. Thanks again.