Open wskang1202 opened 1 year ago
Hi Wonseok,
It looks like the file prelim_map.txt
is missing. Does the file exist in the kraken2/<your HBV db name>/taxonomy
directory? If not, one reason could be that downloading the library failed. Could you please run download_custom_kraken_library.sh
for HBV again and check if the prelim_map.txt
file is downloaded in your HBV database directory?
Please let me know if this didn't work for you.
Best, Sara
Hi, Sara,
I ran download_custom_kraken_library.sh
for HBV again, and I can see that there is prelim_map.txt
file in kraken2/Kraken2StandardDB_k_18_hbv/taxonomy
but the file itself is empty.
Best, Wonseok
Hi Wonseok,
Do you get an error when running download_custom_kraken_library.sh
for the HBV dataset?
Could you please check if the prelim_map.txt
is present and non-empty in the HCV and EBV databases that you created successfully before?
Best, Sara
Hi, Sara.
The prelim_map.txt
is present and non-empty in the successfully-made databases (HCV and EBV as well as k_25_hbv_hg databases). However the file is empty for the unsuccessful k_18_hbv and k_22_hbv databases. I've attached the log.txt file in case you might want to check out.
Thank you, Wonseok
Hi Wonseok,
Did you try running the build_custom_kraken_index.sh
script on k_18_hbv database, after running download_custom_kraken_library.sh
? If so, was there any error?
Hi Javadzadeh,
I have downloaded your suggested dataset for sample-level FastFiVi for the HPV virus: https://drive.google.com/file/d/1QYn5lDWjvhtIWCrwmzDc_1fy8ANrXWz1/view?usp=sharing. However, when I was attempting to extract it, it showed errors (tar -xzvf kraken_datasets.tar.gz). Could you please suggest to me how I can figure it out?
Hi Muhammod,
Thanks for reaching out.
Could you please share the error messages when running tar -xzvf kraken_datasets.tar.gz
?
Hi Javadzadeh,
Thank you so much for your response. I was getting the below errors. The downloaded file size is "15796400321" bytes.
gzip: stdin: invalid compressed data--crc error
tar: Child returned status 1
tar: Error is not recoverable: exiting now
ls -l kraken_datasets.tar.gz
Hi again,
Thanks! Although the output to ls -l
command is truncated in your reply, I can see the file size in your text above. The file size seems correct.
Did you try running gunzip kraken_datasets.tar.gz
and then running tar -xvf kraken_datasets.tar
? If it's failing, could you please share the error?
By the way, the uncompressed should be about 60GB. Is that taken into consideration?
Thanks, Sara
Hi Javadzadeh,
Yes, I have tried. However, it doesn't work out.
gzip: kraken_datasets.tar.gz: invalid compressed data--crc error
Hi Javazadesh, could you please provide a different download link?
I can provide another link, it'll take a couple of hours to upload the database. In the meantime, could you please check the following?
file kraken_datasets.tar.gz
tar -tf kraken_datasets.tar.gz
can list the files without the error or not. If an error, could you please share it?Sara
Yes, it shows errors. You can view the error by clicking the link.
tar -tf kraken_datasets.tar.gz > errors-text.txt
Output:
kraken_datasets/
kraken_datasets/Kraken2StandardDB_k_22_hpv/
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/readme.txt
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/merged.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/taxdump.tar.gz
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/names.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/taxdump.untarflag
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/accmap.dlflag
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/delnodes.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/citations.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/nodes.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/nucl_gb.accession2taxid
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/gc.prt
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/nucl_wgs.accession2taxid
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/division.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/gencode.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/taxdump.dlflag
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/prelim_map.txt
kraken_datasets/Kraken2StandardDB_k_22_hpv/seqid2taxid.map
kraken_datasets/Kraken2StandardDB_k_22_hpv/hash.k2d
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxo.k2d
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/added/
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/added/prelim_map.txt
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/added/9TbkQmfdkG.fna.masked
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/added/9TbkQmfdkG.fna
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/added/prelim_map_3IwJCtpJpX.txt
kraken_datasets/Kraken2StandardDB_k_22_hpv/opts.k2d
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/taxo.k2d
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/prelim_map.txt
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/assembly_summary.txt
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/library.fna.masked
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/library.fna
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/manifest.txt
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/added/
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/added/prelim_map_SeYmVYHiCd.txt
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/added/prelim_map.txt
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/added/rKtNPyn11J.fna
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/added/rKtNPyn11J.fna.masked
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/opts.k2d
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/taxonomy/
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/taxonomy/gencode.dmp
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/taxonomy/nucl_wgs.accession2taxid
7486\t47861343\nAG288467\tAG288467.1\t57486\t47861344\nAG288468\tAG288468.1\t57486\t47861345\nAG288469\tAG28846
639\t112961221\nDQ844259\tDQ844259.1\t1639\t112961224\nDQ844260\tDQ844260.1\t1639\t112961227\nDQ844261\tDQ84426
253\t113251528\nED394649\tED394649.1\t6253\t113251529\nED394650\tED394650.1\t6253\t113251530\nED394651\tED39465
322560303\nJG336704\tJG336704.1\t30301\t322560304\nJG336705\tJG336705.1\t30301\t322560305\nJG336706\tJG336706.
697\nKR112558\tKR112558.1\t1387109\t955261699\nKR112559\tKR112559.1\t1690892\t955261701\nKR112560\tKR112560.1\t
0\t1531460990\nMM160627\tMM160627.1\t0\t1531460991\nMM160628\tMM160628.1\t0\t1531460992\nMM160629\tMM160629.1\t0
61476\t1946114713\nOC673268\tOC673268.1\t61476\t1946114714\nOC673269\tOC673269.1\t61476\t1946114715\nOC673270\t
\t61472\t1948381426\nOD593408\tOD593408.1\t61472\t1948381428\nOD593409\tOD593409.1\t61472\t1948381430\nOD593410
61472\t1947471274\nOD855123\tOD855123.1\t61472\t1947471275\nOD855124\tOD855124.1\t61472\t1947471276\nOD855125\t
\t61474\t1962876452\nOE366104\tOE366104.1\t61474\t1962876453\nOE366105\tOE366105.1\t61474\t1962876454\nOE366106
61474\t1964446754\nOE507499\tOE507499.1\t61474\t1964446757\nOE507500\tOE507500.1\t61474\t1964446760\nOE507501\t
\t61474\t1965131656\nOE607256\tOE607256.1\t61474\t1965131659\nOE607257\tOE607257.1\t61474\t1965131662\nOE607258
003024004.1\t663202\t302664848\nXM_003024005\tXM_003024005.1\t663202\t302664850\nXM_003024006\tXM_003024006.
tar: Skipping to next header
tar: Archive contains ‘9.1\t5748’ where numeric mode_t value expected
tar: Archive contains ‘.1\t57486\t478’ where numeric time_t value expected
7486\t47861343\nAG288467\tAG288467.1\t57486\t47861344\nAG288468\tAG288468.1\t57486\t47861345\nAG288469\tAG28846
tar: Skipping to next header
tar: Archive contains ‘0672.1\t4113\t’ where numeric off_t value expected
tar: Archive contains ‘119.1\t262687’ where numeric off_t value expected
tar: Archive contains ‘1.1\t1639’ where numeric mode_t value expected
tar: Archive contains ‘.1\t1639\t1129’ where numeric time_t value expected
tar: Archive contains ‘\t1129612’ where numeric uid_t value expected
639\t112961221\nDQ844259\tDQ844259.1\t1639\t112961224\nDQ844260\tDQ844260.1\t1639\t112961227\nDQ844261\tDQ84426
tar: Skipping to next header
tar: Archive contains ‘1.1\t6253’ where numeric mode_t value expected
tar: Archive contains ‘.1\t6253\t1132’ where numeric time_t value expected
253\t113251528\nED394649\tED394649.1\t6253\t113251529\nED394650\tED394650.1\t6253\t113251530\nED394651\tED39465
tar: Skipping to next header
tar: Archive contains ‘1609\nEZ97768’ where numeric off_t value expected
tar: Archive contains ‘\tHE793950.1\t’ where numeric off_t value expected
322560303\nJG336704\tJG336704.1\t30301\t322560304\nJG336705\tJG336705.1\t30301\t322560305\nJG336706\tJG336706.
tar: Skipping to next header
tar: Archive contains ‘1759748\t’ where numeric mode_t value expected
tar: Archive contains ‘95526170’ where numeric uid_t value expected
697\nKR112558\tKR112558.1\t1387109\t955261699\nKR112559\tKR112559.1\t1690892\t955261701\nKR112560\tKR112560.1\t
tar: Skipping to next header
tar: Archive contains ‘\tLA487646.1\t’ where numeric off_t value expected
tar: Archive contains ‘29\tMC492929.’ where numeric off_t value expected
tar: Archive contains ‘31460994\nMM1’ where numeric time_t value expected
tar: Archive contains ‘993\nMM16’ where numeric uid_t value expected
0\t1531460990\nMM160627\tMM160627.1\t0\t1531460991\nMM160628\tMM160628.1\t0\t1531460992\nMM160629\tMM160629.1\t0
tar: Skipping to next header
tar: Archive contains ‘_019029293.1’ where numeric off_t value expected
tar: Archive contains ‘\t50390\t15815’ where numeric off_t value expected
tar: Archive contains ‘OC673270’ where numeric mode_t value expected
tar: Archive contains ‘\tOC673271.1\t’ where numeric time_t value expected
tar: Archive contains ‘.1\t61476’ where numeric uid_t value expected
tar: Archive contains ‘\t1946114’ where numeric gid_t value expected
61476\t1946114713\nOC673268\tOC673268.1\t61476\t1946114714\nOC673269\tOC673269.1\t61476\t1946114715\nOC673270\t
tar: Skipping to next header
tar: Archive contains ‘\tOD59341’ where numeric mode_t value expected
tar: Archive contains ‘0.1\t6147’ where numeric uid_t value expected
\t61472\t1948381426\nOD593408\tOD593408.1\t61472\t1948381428\nOD593409\tOD593409.1\t61472\t1948381430\nOD593410
tar: Skipping to next header
tar: Archive contains ‘OD855125’ where numeric mode_t value expected
tar: Archive contains ‘\tOD855126.1\t’ where numeric time_t value expected
tar: Archive contains ‘.1\t61472’ where numeric uid_t value expected
tar: Archive contains ‘\t1947471’ where numeric gid_t value expected
61472\t1947471274\nOD855123\tOD855123.1\t61472\t1947471275\nOD855124\tOD855124.1\t61472\t1947471276\nOD855125\t
tar: Skipping to next header
tar: Archive contains ‘\tOE36610’ where numeric mode_t value expected
tar: Archive contains ‘6.1\t6147’ where numeric uid_t value expected
\t61474\t1962876452\nOE366104\tOE366104.1\t61474\t1962876453\nOE366105\tOE366105.1\t61474\t1962876454\nOE366106
tar: Skipping to next header
tar: Archive contains ‘OE507501’ where numeric mode_t value expected
tar: Archive contains ‘\tOE507502.1\t’ where numeric time_t value expected
tar: Archive contains ‘.1\t61474’ where numeric uid_t value expected
tar: Archive contains ‘\t1964446’ where numeric gid_t value expected
61474\t1964446754\nOE507499\tOE507499.1\t61474\t1964446757\nOE507500\tOE507500.1\t61474\t1964446760\nOE507501\t
tar: Skipping to next header
tar: Archive contains ‘081\nOE597102’ where numeric off_t value expected
tar: Archive contains ‘\tOE60725’ where numeric mode_t value expected
tar: Archive contains ‘9\tOE607259.1’ where numeric time_t value expected
tar: Archive contains ‘8.1\t6147’ where numeric uid_t value expected
\t61474\t1965131656\nOE607256\tOE607256.1\t61474\t1965131659\nOE607257\tOE607257.1\t61474\t1965131662\nOE607258
tar: Skipping to next header
tar: Archive contains ‘03024007.1\t6’ where numeric time_t value expected
tar: Archive contains ‘\t3026648’ where numeric uid_t value expected
003024004.1\t663202\t302664848\nXM_003024005\tXM_003024005.1\t663202\t302664850\nXM_003024006\tXM_003024006.
tar: Skipping to next header
tar: Archive contains ‘008481066.2\t’ where numeric off_t value expected
gzip: stdin: invalid compressed data--crc error
gzip: stdin: invalid compressed data--length error
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Thanks for checking.
I'm uploading the databases again, it'll take another couple of hours to fully upload. I'll share the link here as soon as it happens.
In the meantime, it might be worth setting up a new Conda
environment, installing tar
and trying to extract the database files in this new clean environment. Let me know if you still get the errors.
Sara
Hi again,
Here's a second link for the same Kraken databases: https://drive.google.com/file/d/1DrKgDE7fl5Tff2bV8K9XBxLYsbTeOcgh/view?usp=sharing
I suspect this might be a tar library incompatibility rather than file problem. I was able to list the contents of kraken_datasets.tar.gz
using the first link (provided in the README file). Here's my tar
version on macOS 12.1
tar --version bsdtar 3.5.1 - libarchive 3.5.1 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.8
That's why I would recommend updating your tar
package or create a new Conda environment and try it again as above. Let me know how it goes.
Sara
Thank you, Ms. Javadzadeh. It helped me a lot.
I used a Python script instead of tar,
and this time, it has not shown errors. After extracting, I got a 61.4 GB file size. Is it the correct file size?
import tarfile
sourcePATH = '/mnt/sdb1/kraken2/kraken_datasets.tar.gz'
destinationPATH = '/mnt/sdb1/kraken2/'
with tarfile.open(sourcePATH) as tar:
tar.extractall(destinationPATH)
tar.close()
Great! Thanks for letting me know. The size of extracted files sound reasonable.
Sara
On Wed, Jun 14, 2023 at 9:47 AM Rafsanjani, Muhammod < @.***> wrote:
Thank you, Ms. Javadzadeh. It helped me a lot.
I used a Python script instead of tar, and this time, it has not shown errors. After extracting, I got a 61.4 GB file size. Is it the correct file size?
import tarfile sourcePATH = '/mnt/sdb1/kraken2/kraken_datasets.tar.gz'destinationPATH = '/mnt/sdb1/kraken2/' with tarfile.open(sourcePATH) as tar: tar.extractall(destinationPATH) tar.close()
— Reply to this email directly, view it on GitHub https://github.com/sara-javadzadeh/FastViFi/issues/9#issuecomment-1591638725, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOGKYDQYTPYP3TJYW3C5L53XLHTJVANCNFSM6AAAAAAYIFHEZI . You are receiving this because you commented.Message ID: @.***>
Hi Wonseok,
Did you try running the
build_custom_kraken_index.sh
script on k_18_hbv database, after runningdownload_custom_kraken_library.sh
? If so, was there any error?
hi sara
i meet the same error with Wonseok
no error running the build_custom_kraken_index.sh
and download_custom_kraken_library.sh
in k_18 and k_22
prelim_map.txt is empty in k_18 and k_22
but prelim_map.txt in k_25_hg is ok
i running the docker show error does not contain necessary file taxo.k2d
Hi Sara,
I've been trying to build custom databases by following FastViFi Readme. Building databases for HCV and EBV were successful, however, building the hbv databases for k=18, k-22 were unsuccessful. The following message was shown in the log file:
Is there a way to solve this problem?
Best, Wonseok