qunfengdong / BLCA

34 stars 12 forks source link

BLAST Database error: Not a valid version 4database #35

Open nana-marinbio opened 2 years ago

nana-marinbio commented 2 years ago

Hi BLCA team, @YJulyXing , @yingeddi2008, @koopkaup ,@qunfengdong

I tryed to run BLCA with the standard NCBI 16S microbial database, but the taxonomy and taxID files created are empty. See below where the error message showed up and the final dataset created.

$python 1.subset_db_acc.py

--2022-04-06 13:24:43-- ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip => ‘db/taxdmp.zip’ Resolving ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)... 130.14.250.13, 165.112.9.228, 2607:f220:41f:250::229, ... Connecting to ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)|130.14.250.13|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/taxonomy ... done. ==> SIZE taxdmp.zip ... 57732633 ==> PASV ... done. ==> RETR taxdmp.zip ... done. Length: 57732633 (55M) (unauthoritative)

taxdmp.zip 100%[=============================================================================================================================================================>] 55.06M 5.20MB/s in 17s

2022-04-06 13:25:04 (3.33 MB/s) - ‘db/taxdmp.zip’ saved [57732633]

Archive: db/taxdmp.zip inflating: db/citations.dmp
inflating: db/delnodes.dmp
inflating: db/division.dmp
inflating: db/gencode.dmp
inflating: db/merged.dmp
inflating: db/names.dmp
inflating: db/nodes.dmp
inflating: db/gc.prt
inflating: db/readme.txt

NCBI Taxonomy Database downloaded! blastdbcmd is located in your PATH! --2022-04-06 13:25:07-- https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 165.112.9.228, 130.14.250.13, 2607:f220:41e:250::7, ... Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|165.112.9.228|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 38650164 (37M) [application/x-gzip] Saving to: ‘db/16S_ribosomal_RNA.tar.gz’

16S_ribosomal_RNA.tar.gz 100%[=============================================================================================================================================================>] 36.86M 5.67MB/s in 8.3s

2022-04-06 13:25:16 (4.43 MB/s) - ‘db/16S_ribosomal_RNA.tar.gz’ saved [38650164/38650164]

16S_ribosomal_RNA.ndb 16S_ribosomal_RNA.nhr 16S_ribosomal_RNA.nin 16S_ribosomal_RNA.nnd 16S_ribosomal_RNA.nni 16S_ribosomal_RNA.nog 16S_ribosomal_RNA.nos 16S_ribosomal_RNA.not 16S_ribosomal_RNA.nsq 16S_ribosomal_RNA.ntf 16S_ribosomal_RNA.nto taxdb.btd taxdb.bti BLAST Database error: Error: Not a valid version 4 database.

Accession and TaxIDs from 16S_ribosomal_RNA are extracted!!

Loading 16S_ribosomal_RNA TaxID list... ('>> Loading 16S_ribosomal_RNA TaxID list...DONE!\nTotal', 0, 'TaxID to fetch!') Loading nodes.dmp...This will take 10-15 minutes, please wait! 1 >> Open nodes.dmp file! ('Scanning nodes.dmp line:', 100000) ('Scanning nodes.dmp line:', 200000) ('Scanning nodes.dmp line:', 300000) ('Scanning nodes.dmp line:', 400000) ('Scanning nodes.dmp line:', 500000) ('Scanning nodes.dmp line:', 600000) ('Scanning nodes.dmp line:', 700000) ('Scanning nodes.dmp line:', 800000) ('Scanning nodes.dmp line:', 900000) ('Scanning nodes.dmp line:', 1000000) ('Scanning nodes.dmp line:', 1100000) ('Scanning nodes.dmp line:', 1200000) ('Scanning nodes.dmp line:', 1300000) ('Scanning nodes.dmp line:', 1400000) ('Scanning nodes.dmp line:', 1500000) ('Scanning nodes.dmp line:', 1600000) ('Scanning nodes.dmp line:', 1700000) ('Scanning nodes.dmp line:', 1800000) ('Scanning nodes.dmp line:', 1900000) ('Scanning nodes.dmp line:', 2000000) ('Scanning nodes.dmp line:', 2100000) ('Scanning nodes.dmp line:', 2200000) ('Scanning nodes.dmp line:', 2300000) ('Scanning nodes.dmp line:', 2400000) 2 >> Close nodes.dmp file! ('>> Remaining # TaxID to look for:', 0) Loading nodes.dmp...DONE! Loading names.dmp... This may take 20-30 minutes. Please wait! ('>> ', 50000, 'names recorded!') ('>> ', 100000, 'names recorded!') ('>> ', 150000, 'names recorded!') ('>> ', 200000, 'names recorded!') ('>> ', 250000, 'names recorded!') ('>> ', 300000, 'names recorded!') ('>> ', 350000, 'names recorded!') ('>> ', 400000, 'names recorded!') ('>> ', 450000, 'names recorded!') ('>> ', 500000, 'names recorded!') ('>> ', 550000, 'names recorded!') ('>> ', 600000, 'names recorded!') ('>> ', 650000, 'names recorded!') ('>> ', 700000, 'names recorded!') ('>> ', 750000, 'names recorded!') ('>> ', 800000, 'names recorded!') ('>> ', 850000, 'names recorded!') ('>> ', 900000, 'names recorded!') ('>> ', 950000, 'names recorded!') ('>> ', 1000000, 'names recorded!') ('>> ', 1050000, 'names recorded!') ('>> ', 1100000, 'names recorded!') ('>> ', 1150000, 'names recorded!') ('>> ', 1200000, 'names recorded!') ('>> ', 1250000, 'names recorded!') ('>> ', 1300000, 'names recorded!') ('>> ', 1350000, 'names recorded!') ('>> ', 1400000, 'names recorded!') ('>> ', 1450000, 'names recorded!') ('>> ', 1500000, 'names recorded!') ('>> ', 1550000, 'names recorded!') ('>> ', 1600000, 'names recorded!') ('>> ', 1650000, 'names recorded!') ('>> ', 1700000, 'names recorded!') ('>> ', 1750000, 'names recorded!') ('>> ', 1800000, 'names recorded!') ('>> ', 1850000, 'names recorded!') ('>> ', 1900000, 'names recorded!') ('>> ', 1950000, 'names recorded!') ('>> ', 2000000, 'names recorded!') ('>> ', 2050000, 'names recorded!') ('>> ', 2100000, 'names recorded!') ('>> ', 2150000, 'names recorded!') ('>> ', 2200000, 'names recorded!') ('>> ', 2250000, 'names recorded!') ('>> ', 2300000, 'names recorded!') ('>> ', 2350000, 'names recorded!') ('>> ', 2400000, 'names recorded!') ('>> ', 2450000, 'names recorded!') ('>> ', 2500000, 'names recorded!') ('>> ', 2550000, 'names recorded!') ('>> ', 2600000, 'names recorded!') ('>> ', 2650000, 'names recorded!') ('>> ', 2700000, 'names recorded!') ('>> ', 2750000, 'names recorded!') ('>> ', 2800000, 'names recorded!') ('>> ', 2850000, 'names recorded!') ('>> ', 2900000, 'names recorded!') ('>> ', 2950000, 'names recorded!') ('>> ', 3000000, 'names recorded!') ('>> ', 3050000, 'names recorded!') ('>> ', 3100000, 'names recorded!') ('>> ', 3150000, 'names recorded!') ('>> ', 3200000, 'names recorded!') ('>> ', 3250000, 'names recorded!') ('>> ', 3300000, 'names recorded!') ('>> ', 3350000, 'names recorded!') ('>> ', 3400000, 'names recorded!') ('>> ', 3450000, 'names recorded!') ('>> ', 3500000, 'names recorded!') ('>> ', 3550000, 'names recorded!') Loading names.dmp...DONE! Generating a subset of taxonomy file. Taxonomy file generated!

ubuntu@ubuntu18:~/BLCA/db$ ls -lh total 660M -rw-rw-r-- 1 ubuntu ubuntu 0 Apr 6 13:25 16S_ribosomal_RNA.ACC.taxonomy -rw-rw-r-- 1 ubuntu ubuntu 0 Apr 6 13:25 16S_ribosomal_RNA.ACCtaxID -rw-rw-r-- 1 ubuntu ubuntu 1.2M Mar 29 06:36 16S_ribosomal_RNA.ndb -rw-rw-r-- 1 ubuntu ubuntu 3.4M Mar 29 06:36 16S_ribosomal_RNA.nhr -rw-rw-r-- 1 ubuntu ubuntu 262K Mar 29 06:36 16S_ribosomal_RNA.nin -rw-rw-r-- 1 ubuntu ubuntu 175K Mar 29 06:36 16S_ribosomal_RNA.nnd -rw-rw-r-- 1 ubuntu ubuntu 748 Mar 29 06:36 16S_ribosomal_RNA.nni -rw-rw-r-- 1 ubuntu ubuntu 88K Mar 29 06:36 16S_ribosomal_RNA.nog -rw-rw-r-- 1 ubuntu ubuntu 437K Mar 29 06:36 16S_ribosomal_RNA.nos -rw-rw-r-- 1 ubuntu ubuntu 262K Mar 29 06:36 16S_ribosomal_RNA.not -rw-rw-r-- 1 ubuntu ubuntu 7.9M Mar 29 06:36 16S_ribosomal_RNA.nsq -rw-rw-r-- 1 ubuntu ubuntu 592K Mar 29 06:36 16S_ribosomal_RNA.ntf -rw-rw-r-- 1 ubuntu ubuntu 154K Mar 29 06:36 16S_ribosomal_RNA.nto -rw-rw-r-- 1 ubuntu ubuntu 37M Mar 29 06:36 16S_ribosomal_RNA.tar.gz -rw-r--r-- 1 ubuntu ubuntu 19M Apr 6 12:29 citations.dmp -rw-r--r-- 1 ubuntu ubuntu 4.2M Apr 6 12:26 delnodes.dmp -rw-r--r-- 1 ubuntu ubuntu 452 Apr 6 12:20 division.dmp -rw-r--r-- 1 ubuntu ubuntu 17K Apr 6 12:29 gc.prt -rw-r--r-- 1 ubuntu ubuntu 4.9K Apr 6 12:20 gencode.dmp -rw-r--r-- 1 ubuntu ubuntu 1.2M Apr 6 12:26 merged.dmp -rw-r--r-- 1 ubuntu ubuntu 208M Apr 6 12:29 names.dmp -rw-r--r-- 1 ubuntu ubuntu 159M Apr 6 12:28 nodes.dmp -rw-rw---- 1 ubuntu ubuntu 2.7K Sep 11 2019 readme.txt -rw-rw-r-- 1 ubuntu ubuntu 149M Mar 29 06:36 taxdb.btd -rw-rw-r-- 1 ubuntu ubuntu 16M Mar 29 06:36 taxdb.bti -rw-rw-r-- 1 ubuntu ubuntu 56M Apr 6 13:25 taxdmp.zip

I checked the 16S_ribosomal_RNA.tar.gz file with md5sum and it is OK! I have the latest blast ncbi-blast-2.13.0+, python3.6.9 biopython-1.79 clustalo 1.2.4-1 muscle5.0.98_linux

How can manage this?