steineggerlab / conterminator

Detection of incorrectly labeled sequences across kingdoms
GNU General Public License v3.0
79 stars 7 forks source link

Conterminator stopped after downloading taxdump.tar.gz #5

Closed martin-steinegger closed 3 years ago

martin-steinegger commented 4 years ago

The command "conterminator dna example/dna.fna example/dna.mapping ${RESULT_PREFIX} tmp" stops while createtaxdb

Download taxdump.tar.gz
tar: Skipping to next header
2020-04-18 14:36:42 URL:https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz [51859296/51859296] -> "-" [7]

gzip: stdin: invalid compressed data--crc error

gzip: stdin: invalid compressed data--length error
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Error: createtaxdb step died
XiongGZ commented 4 years ago

Thank you for this. I think if the taxdump.tar.gz download fail, so createtaxdb step died. Maybe I can download it by myself? But I don't know it should be put in which directory.

martin-steinegger commented 4 years ago

What happens if you download it and unzip on that machine? Does it crash too? Conterminator just uses wget and gunzip.

XiongGZ commented 4 years ago

gunzip? Maybe *.tar.gz file should use tar to unzip? When I use gunzip it just become taxdump.tar ,but I can use command "tar zxvf taxdump.tar.gz" to unzip. I will use the unzip file try again but the unzip files should be saved in which directory?

XiongGZ commented 4 years ago

I see MMseqs2 useage "mmseqs createtaxdb"

Create a seqTaxDB from an existing BLAST database It is easy to create a seqTaxDB from a pre-existing local BLAST databases, if BLAST+ is installed. The following example creates an MMSeqs2 database from NCBI's nt database, but it also works with any of the other BLAST databases including the nr protein database.

First, manually download the NCBI taxonomy database dump:

wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz mkdir taxonomy && tar -xxvf taxdump.tar.gz -C taxonomy

Maybe I should follow these commmands first

martin-steinegger commented 4 years ago

I have added support to provide your own taxdump file. I hope this solves your issue.

mkdir taxonomy/
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar xvfz taxdump.tar.gz
cd ..
conterminator dna dna.fas dna.mapping conterm tmp --ncbi-tax-dump taxonomy/
XiongGZ commented 4 years ago

Thanks for your support. I successfully solve the problem.

chassenr commented 3 years ago

Hi @martin-steinegger , I am trying to run conterminator with a local taxdump that I created myself (containing nodes.dmp and names.dmp files). I use the option --ncbi-tax-dump for this. However, conterminator still downloads and uses the taxdump from NCBI (which, of course, does not match with my taxids). Do you have a suggestion how to avoid this behavior?

Thanks!

Cheers, Christiane

martin-steinegger commented 3 years ago

@chassenr what version do you use? This PR should have fixed this issue https://github.com/martin-steinegger/conterminator/pull/7

chassenr commented 3 years ago

@martin-steinegger I used mamba to install conterminator and installed version 1.c74b5. I installed conterminator from source, and now the local taxdump is recognized.

drelo commented 3 years ago

I edited the question since I could resolve this. Sorry I thought I couldn't solve this but I could compile it on my own machine and later on at a cluster.