Open cubense opened 7 months ago
Hi Cubense,
To better understand the problem, could you please answer the following questions?
Best, Sara
first, i found i got the same problem in #9
no error running the build_custom_kraken_index.sh and download_custom_kraken_library.sh in k_18 and k_22 prelim_map.txt is empty in k_18 and k_22 but prelim_map.txt in k_25_hg is ok i running the docker show error does not contain necessary file taxo.k2d
so i run
./kraken2-inspect --db Kraken2StandardDB_k_25_hbv_hg its ok
Database options: nucleotide db, k = 25, l = 22 Spaced mask = 11111111111111111111111111110011001100110011 Toggle mask = 1110001101111110001010001100010000100111000110110101101000101101 Total taxonomy nodes: 33 Table size: 780814852 Table capacity: 1115271314 Min clear hash value = 0 100.00 780814852 0 R 1 root 100.00 780814852 0 R1 131567 cellular organisms 100.00 780814852 0 D 2759 Eukaryota 100.00 780814852 0 D1 33154 Opisthokonta 100.00 780814852 0 K 33208 Metazoa 100.00 780814852 0 K1 6072 Eumetazoa 100.00 780814852 0 K2 33213 Bilateria 100.00 780814852 0 K3 33511 Deuterostomia 100.00 780814852 0 P 7711 Chordata 100.00 780814852 0 P1 89593 Craniata 100.00 780814852 0 P2 7742 Vertebrata 100.00 780814852 0 P3 7776 Gnathostomata 100.00 780814852 0 P4 117570 Teleostomi 100.00 780814852 0 P5 117571 Euteleostomi 100.00 780814852 0 P6 8287 Sarcopterygii 100.00 780814852 0 P7 1338369 Dipnotetrapodomorpha 100.00 780814852 0 P8 32523 Tetrapoda 100.00 780814852 0 P9 32524 Amniota 100.00 780814852 0 C 40674 Mammalia 100.00 780814852 0 C1 32525 Theria 100.00 780814852 0 C2 9347 Eutheria 100.00 780814852 0 C3 1437010 Boreoeutheria 100.00 780814852 0 C4 314146 Euarchontoglires 100.00 780814852 0 O 9443 Primates 100.00 780814852 0 O1 376913 Haplorrhini 100.00 780814852 0 O2 314293 Simiiformes 100.00 780814852 0 O3 9526 Catarrhini 100.00 780814852 0 O4 314295 Hominoidea 100.00 780814852 0 F 9604 Hominidae 100.00 780814852 0 F1 207598 Homininae 100.00 780814852 0 G 9605 Homo 100.00 780814852 780814852 S 9606 Homo sapiens
but when i run
./kraken2-inspect --db Kraken2StandardDB_k_22_hbv
kraken2-inspect: database ("./Kraken2StandardDB_k_22_hbv") does not contain necessary file taxo.k2d
then i try to run
./kraken2-build --build --db ./Kraken2StandardDB_k_22_hbv --kmer-len 22 --minimizer-len 19 --minimizer-spaces 3
Creating sequence ID to taxonomy ID map (step 1)... No preliminary seqid/taxid mapping files found, aborting.
k25 is ok so i want to access whether the docker image work or not , so i try to run fastvifi using your pipeline in your github:
python ./run_kraken_vifi_container.py --docker --output-dir /mnt/d/PROJECT/code/fastvifi/1/output --input-file /mnt/d/PROJECT/code/fastvifi/1/input/S9fp.bam --skip-bwa-filter --virus hbv --kraken-db-path /mnt/d/PROJECT/code/fastvifi/1/kraken2-master/Kraken2StandardDB_k_25_hbv_hg --level sample-level --vifi-viral-ref-dir /mnt/d/PROJECT/code/fastvifi/1/viral_data/hbv/ --human-chr-list /mnt/d/PROJECT/code/fastvifi/1/data_repo/GRCh38/chrom_list.txt --vifi-human-ref-dir /mnt/d/PROJECT/code/fastvifi/1/data_repo/GRCh38/
Changing the mode of the output directory to be writable by other users (i.e., the user in the docker container)
docker run --rm --read-only -v /mnt/d/PROJECT/code/fastvifi/1/input:/home/input/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/kraken2-master/Kraken2StandardDB_k_25_hbv_hg:/home/kraken2-db --read-only -v /mnt/d/PROJECT/code/fastvifi/1/data_repo/GRCh38/:/home/data_repo/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/viral_data/hbv/:/home/repo/data/ -v /mnt/d/PROJECT/code/fastvifi/1/output:/home/output sarajava/fastvifi:v1.1 python /home/fastvifi/run_kraken_vifi_pipeline.py --kraken-path /home/kraken2/kraken2 --vifi-path /home/ViFi/scripts/run_vifi.py --output /home/output --human-chr-list /home/data_repo/GRCh38/chrom_list.txt --kraken-db-path /home/kraken2-db --docker --virus hbv --input-file /home/input/S9fp.bam --skip-bwa-filter
Traceback (most recent call last):
File "/mnt/d/PROJECT/code/fastvifi/1/FastViFi-main/./run_kraken_vifi_container.py", line 95, in
and then i try in the docker to run the command in the docker, so i run docker first:
docker run -t -i -v /mnt/d/PROJECT/code/fastvifi/1:/data sarajava/fastvifi:v1.1 /bin/bash
python ./run_kraken_vifi_pipeline.py --kraken-path /home/kraken2/kraken2 --vifi-path /home/ViFi/scripts/run_vifi.py --output /data/output --human-chr-list /home/data_repo/GRCh38/chrom_list.txt --kraken-db-path /data/kraken2-master/ --level sample-level --virus hbv --input-file /data/wgs/input/S9fp.bam --skip-bwa-filter
Warning! output directory already exists. Rewriting previous outputs.
Loading database information... done.
0 sequences (0.00 Mbp) processed in 0.000s (0.0 Kseq/m, 0.00 Mbp/m).
0 sequences classified (-nan%)
0 sequences unclassified (-nan%)
Command being timed: "/home/kraken2/kraken2 --use-names --report /dev/null --db /data/kraken2-master/Kraken2StandardDB_k_25_hbv_hg --threads 1 --paired --f-threshold 0.4 --keep-unmapped-reads --unmapped-threshold 0.8 --classified-out /data/output/reads_passing_kraken_first_level_for_virus_hbv#.fq /data/output/reads_passing_bwa_filter_1.fq /data/output/reads_passing_bwa_filter_2.fq --output /dev/null"
User time (seconds): 0.00
System time (seconds): 1.16
Percent of CPU this job got: 4%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:25.83
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 4360356
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 3760
Voluntary context switches: 17072
Involuntary context switches: 3
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
kraken2: database ("/data/kraken2-master/Kraken2StandardDB_k_22_hbv") does not contain necessary file taxo.k2d
Command exited with non-zero status 2
Command being timed: "/home/kraken2/kraken2 --use-names --report /dev/null --db /data/kraken2-master/Kraken2StandardDB_k_22_hbv --threads 1 --paired --f-threshold 0.5 --unmapped-threshold 0.9 --classified-out /data/output/reads_passing_kraken_filter_for_virus_hbv#.fq /data/output/reads_passing_kraken_first_level_for_virus_hbv_1.fq /data/output/reads_passing_kraken_first_level_for_virus_hbv_2.fq --output /dev/null"
User time (seconds): 0.01
System time (seconds): 0.00
Percent of CPU this job got: 92%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 6916
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 848
Voluntary context switches: 5
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 2
Traceback (most recent call last):
File "./run_kraken_vifi_pipeline.py", line 491, in
Hi Cubense,
To better understand the problem, could you please answer the following questions?
- Could you please share the exact command you used when getting this error?
- To be more precise, are you using Docker or Singularity?
- Is the error about existing file a warning or a fatal error? I did not have this as a fatal error stopping the program. So I'm wondering if this was a warning and another error stopped the program or if this was the error stopping the program. To be sure, maybe you can share the full output here.
Best, Sara
Dear @sara-javadzadeh ,
my code and the errors have been pasted above, i think the there were some problem not only in HBV database building, but also in docker process.
what can i do to debug
thank you
cubense
Hi Cubense,
Thanks for providing all the information. For building custom Kraken databases, I recommend running build_custom_kraken_database.sh
instead of kraken2-build
which is further explained in the custom Kraken database section of readme . The reason being that you have to run kraken2-build
with --download-library
and --download-taxonomy
before running it with --build --db
. That is why there is a script (build_custom_kraken_database.sh) in my forked kraken2 repository taking care of all the required commands needed before building a kraken database.
A quick note that build_custom_kraken_index.sh
is meant to be called by build_custom_kraken_database.sh
. Seems like you have been calling build_custom_kraken_index.sh
instead of build_custom_kraken_database.sh
which skips a few steps.
Please try running build_custom_kraken_database.sh
as instructed in the custom Kraken database section of readme and let me know if you could successfully build Kraken2StandardDB_k_22_hbv
.
With regard to the error when running run_kraken_vifi_container.py
, thanks for sharing the output. I can see the subprocess.CalledProcessError which is probably a similar error that you get when running run_kraken_vifi_pipeline.py
within a docker environment. It could be because the kraken database is not created, so the intermediate files are empty. I think it might be best to first fix the database problem and then run the pipeline again.
Best, Sara
Hi Cubense,
Thanks for providing all the information. For building custom Kraken databases, I recommend running instead of which is further explained in the custom Kraken database section of readme . The reason being that you have to run with and before running it with . That is why there is a script (build_custom_kraken_database.sh) in my forked kraken2 repository taking care of all the required commands needed before building a kraken database.
build_custom_kraken_database.sh ``kraken2-build``kraken2-build``--download-library``--download-taxonomy``--build --db
A quick note that is meant to be called by . Seems like you have been calling instead of which skips a few steps.
build_custom_kraken_index.sh``build_custom_kraken_database.sh``build_custom_kraken_index.sh``build_custom_kraken_database.sh
Please try running as instructed in the custom Kraken database section of readme and let me know if you could successfully build .
build_custom_kraken_database.sh``Kraken2StandardDB_k_22_hbv
With regard to the error when running , thanks for sharing the output. I can see the subprocess.CalledProcessError which is probably a similar error that you get when running within a docker environment. It could be because the kraken database is not created, so the intermediate files are empty. I think it might be best to first fix the database problem and then run the pipeline again.
run_kraken_vifi_container.py``run_kraken_vifi_pipeline.py
Best, Sara
hi Sara,
Thank you for your response.
The command I previously used to build the database was
build_custom_kraken_database.sh hbv ./hbv.unaligned.fas (or hbv.aligned.fasta) 9000000
and database Kraken2StandardDB_k_25_hbv_hg
have included three *.k2d files
but database Kraken2StandardDB_k_22_hbv
and Kraken2StandardDB_k_18_hbv
, there are only two folders including library and taxonomy, and three .k2d files are missing.
I have run this command many times, and each time I get the same result without any errors or warnings.
I can only successfully build one of three, and it's quite puzzling. It aligns with the issue #9 ,I described above.
If possible, could you please upload the constructed hbv database? I would greatly appreciate it.
look forward your reply.
thanks
cubense
Hi Cubense,
I'm working on compressing the kraken databases and uploading them somewhere you can download. It might take a while because of the size of the files and my not so great internet connection. Stay tuned for that.
I also noticed that the hbv_hg kraken database where you could run kraken2-inspect
on successfully, does not contain HBV sequences. Ideally, in the output of kraken2-inspect
I want to see a taxonomy tree (with all the intermediate nodes) for human and HBV. The taxonomy tree you printed here only includes human and does not include HBV.
If I had to guess, there might be something wrong with the HBV reference sequences you pass to the build_custom_kraken_database.sh
script. The file should contain unaligned HBV reference sequences with descriptive names for each sequence. Please re-try with this hbv reference file, to make sure the viral sequence names are correctly identified. If you have added more references than are in the reference above, feel free to add them, but please make sure the sequence id follows the format as is in the file.
Best, Sara
Hi Cubense,
Please find Kraken2StandardDB_k_22_hbv
and Kraken2StandardDB_k_18_hbv
compressed in a single file (11.23GB) here.
The Kraken2StandardDB_k_25_hbv_hg
database is lacking hbv genomes from what I see at kraken2-inspect
output that you shared above. But FastViFi algorithm should work without corrected Kraken2StandardDB_k_25_hbv_hg
in theory, although the sensitivity might be negatively affected. I say in theory because I have not tested FastViFi with an absent viral genome in Kraken2StandardDB_k_25_hbv_hg
myself. Please go through my previous comment and this comment and let me know if it works for you.
Hi Cubense,
I'm working on compressing the kraken databases and uploading them somewhere you can download. It might take a while because of the size of the files and my not so great internet connection. Stay tuned for that. I also noticed that the hbv_hg kraken database where you could run
kraken2-inspect
on successfully, does not contain HBV sequences. Ideally, in the output ofkraken2-inspect
I want to see a taxonomy tree (with all the intermediate nodes) for human and HBV. The taxonomy tree you printed here only includes human and does not include HBV. If I had to guess, there might be something wrong with the HBV reference sequences you pass to thebuild_custom_kraken_database.sh
script. The file should contain unaligned HBV reference sequences with descriptive names for each sequence. Please re-try with this hbv reference file, to make sure the viral sequence names are correctly identified. If you have added more references than are in the reference above, feel free to add them, but please make sure the sequence id follows the format as is in the file.Best, Sara
Dear Sara @sara-javadzadeh , Thank you for your efforts! I can build 3 hbv database using your unaligned HBV reference sequences with descriptive names. but i got the same error when i run fastvifi in docker:
python run_kraken_vifi_container.py --docker --input-file /mnt/d/PROJECT/code/fastvifi/1/input/S9a.bam --output-dir /mnt/d/PROJECT/code/fastvifi/1/output --virus hbv --kraken-db-path /mnt/d/PROJECT/code/fastvifi/1/kraken2-master --vifi-viral-ref-dir /mnt/d/PROJECT/code/fastvifi/1/viral_data/hbv --human-chr-list
/mnt/d/PROJECT/code/fastvifi/1/data_repo/GRCh38 --vifi-human-ref-dir /mnt/d/PROJECT/code/fastvifi/1/data_repo
Changing the mode of the output directory to be writable by other users (i.e., the user in the docker container)
docker run --rm --read-only -v /mnt/d/PROJECT/code/fastvifi/1/input:/home/input/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/kraken2-master:/home/kraken2-db --read-only -v /mnt/d/PROJECT/code/fastvifi/1/data_repo:/home/data_repo/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/viral_data/hbv:/home/repo/data/ -v /mnt/d/PROJECT/code/fastvifi/1/output:/home/output sarajava/fastvifi:v1.1 python /home/fastvifi/run_kraken_vifi_pipeline.py --kraken-path /home/kraken2/kraken2 --vifi-path /home/ViFi/scripts/run_vifi.py --output /home/output --human-chr-list /home/data_repo/GRCh38/chrom_list.txt --kraken-db-path /home/kraken2-db --docker --virus hbv --input-file /home/input/S9a.bam
/bin/sh: 1: cannot create output: Read-only file system
python: can't open file 'filter_reads_bwa_efficient.py': [Errno 2] No such file or directory
Command exited with non-zero status 2
Command being timed: "python filter_reads_bwa_efficient.py /home/input/S9a.bam /home/output /home/data_repo/GRCh38/chrom_list.txt reads_passing_bwa_filter"
User time (seconds): 0.01
System time (seconds): 0.00
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 8912
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1
Minor (reclaiming a frame) page faults: 1086
Voluntary context switches: 1
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 2
Traceback (most recent call last):
File "/home/fastvifi/run_kraken_vifi_pipeline.py", line 491, in
I think the docker image may be incomplete because some file cannot download. And i have tried to pull many times, the image label show: [sarajava/fastvifi] 769aca5295e6 v1.1 4 months ago 2.7 GB so i need you confirm that if the image is 2.7GB or not. And what shoud I do if i want to debug. thank you for your help. looking forward your reply.
cubense
error in run kraken_vifi_conainer.py using docker. I have already built custom databases for FastViFi changing the mode of the output directory to be writable by other users ..... returned non-zero exit status 1 when i resub this command again , there plus a docker error :response from daemon error while creating mount source path... file exists. so could you tell me what should i do then?