sara-javadzadeh / FastViFi

Detect viral infection and integration sites on NGS input. Manuscript is in preparation.
GNU General Public License v3.0
9 stars 2 forks source link

error in run kraken_vifi_conainer.py using docker: subprocess.calledprocesserror returned non-zero exit status 1 #13

Open cubense opened 7 months ago

cubense commented 7 months ago

error in run kraken_vifi_conainer.py using docker. I have already built custom databases for FastViFi changing the mode of the output directory to be writable by other users ..... returned non-zero exit status 1 when i resub this command again , there plus a docker error :response from daemon error while creating mount source path... file exists. so could you tell me what should i do then?

sara-javadzadeh commented 7 months ago

Hi Cubense,

To better understand the problem, could you please answer the following questions?

  1. Could you please share the exact command you used when getting this error?
  2. To be more precise, are you using Docker or Singularity?
  3. Is the error about existing file a warning or a fatal error? I did not have this as a fatal error stopping the program. So I'm wondering if this was a warning and another error stopped the program or if this was the error stopping the program. To be sure, maybe you can share the full output here.

Best, Sara

cubense commented 6 months ago

first, i found i got the same problem in #9

no error running the build_custom_kraken_index.sh and download_custom_kraken_library.sh in k_18 and k_22 prelim_map.txt is empty in k_18 and k_22 but prelim_map.txt in k_25_hg is ok i running the docker show error does not contain necessary file taxo.k2d

so i run

./kraken2-inspect --db Kraken2StandardDB_k_25_hbv_hg its ok

Database options: nucleotide db, k = 25, l = 22 Spaced mask = 11111111111111111111111111110011001100110011 Toggle mask = 1110001101111110001010001100010000100111000110110101101000101101 Total taxonomy nodes: 33 Table size: 780814852 Table capacity: 1115271314 Min clear hash value = 0 100.00 780814852 0 R 1 root 100.00 780814852 0 R1 131567 cellular organisms 100.00 780814852 0 D 2759 Eukaryota 100.00 780814852 0 D1 33154 Opisthokonta 100.00 780814852 0 K 33208 Metazoa 100.00 780814852 0 K1 6072 Eumetazoa 100.00 780814852 0 K2 33213 Bilateria 100.00 780814852 0 K3 33511 Deuterostomia 100.00 780814852 0 P 7711 Chordata 100.00 780814852 0 P1 89593 Craniata 100.00 780814852 0 P2 7742 Vertebrata 100.00 780814852 0 P3 7776 Gnathostomata 100.00 780814852 0 P4 117570 Teleostomi 100.00 780814852 0 P5 117571 Euteleostomi 100.00 780814852 0 P6 8287 Sarcopterygii 100.00 780814852 0 P7 1338369 Dipnotetrapodomorpha 100.00 780814852 0 P8 32523 Tetrapoda 100.00 780814852 0 P9 32524 Amniota 100.00 780814852 0 C 40674 Mammalia 100.00 780814852 0 C1 32525 Theria 100.00 780814852 0 C2 9347 Eutheria 100.00 780814852 0 C3 1437010 Boreoeutheria 100.00 780814852 0 C4 314146 Euarchontoglires 100.00 780814852 0 O 9443 Primates 100.00 780814852 0 O1 376913 Haplorrhini 100.00 780814852 0 O2 314293 Simiiformes 100.00 780814852 0 O3 9526 Catarrhini 100.00 780814852 0 O4 314295 Hominoidea 100.00 780814852 0 F 9604 Hominidae 100.00 780814852 0 F1 207598 Homininae 100.00 780814852 0 G 9605 Homo 100.00 780814852 780814852 S 9606 Homo sapiens

but when i run

./kraken2-inspect --db Kraken2StandardDB_k_22_hbv

kraken2-inspect: database ("./Kraken2StandardDB_k_22_hbv") does not contain necessary file taxo.k2d

then i try to run

./kraken2-build --build --db ./Kraken2StandardDB_k_22_hbv --kmer-len 22 --minimizer-len 19 --minimizer-spaces 3

Creating sequence ID to taxonomy ID map (step 1)... No preliminary seqid/taxid mapping files found, aborting.

k25 is ok so i want to access whether the docker image work or not , so i try to run fastvifi using your pipeline in your github:

python ./run_kraken_vifi_container.py --docker --output-dir /mnt/d/PROJECT/code/fastvifi/1/output --input-file /mnt/d/PROJECT/code/fastvifi/1/input/S9fp.bam --skip-bwa-filter --virus hbv --kraken-db-path /mnt/d/PROJECT/code/fastvifi/1/kraken2-master/Kraken2StandardDB_k_25_hbv_hg --level sample-level --vifi-viral-ref-dir /mnt/d/PROJECT/code/fastvifi/1/viral_data/hbv/ --human-chr-list /mnt/d/PROJECT/code/fastvifi/1/data_repo/GRCh38/chrom_list.txt --vifi-human-ref-dir /mnt/d/PROJECT/code/fastvifi/1/data_repo/GRCh38/

Changing the mode of the output directory to be writable by other users (i.e., the user in the docker container) docker run --rm --read-only -v /mnt/d/PROJECT/code/fastvifi/1/input:/home/input/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/kraken2-master/Kraken2StandardDB_k_25_hbv_hg:/home/kraken2-db --read-only -v /mnt/d/PROJECT/code/fastvifi/1/data_repo/GRCh38/:/home/data_repo/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/viral_data/hbv/:/home/repo/data/ -v /mnt/d/PROJECT/code/fastvifi/1/output:/home/output sarajava/fastvifi:v1.1 python /home/fastvifi/run_kraken_vifi_pipeline.py --kraken-path /home/kraken2/kraken2 --vifi-path /home/ViFi/scripts/run_vifi.py --output /home/output --human-chr-list /home/data_repo/GRCh38/chrom_list.txt --kraken-db-path /home/kraken2-db --docker --virus hbv --input-file /home/input/S9fp.bam --skip-bwa-filter Traceback (most recent call last): File "/mnt/d/PROJECT/code/fastvifi/1/FastViFi-main/./run_kraken_vifi_container.py", line 95, in call_fastvifi_pipeline(args) File "/mnt/d/PROJECT/code/fastvifi/1/FastViFi-main/./run_kraken_vifi_container.py", line 88, in call_fastvifi_pipeline shell_output = subprocess.check_output( File "/home/paddling/miniconda3/lib/python3.9/subprocess.py", line 424, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/home/paddling/miniconda3/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'docker run --rm --read-only -v /mnt/d/PROJECT/code/fastvifi/1/input:/home/input/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/kraken2-master/Kraken2StandardDB_k_25_hbv_hg:/home/kraken2-db --read-only -v /mnt/d/PROJECT/code/fastvifi/1/data_repo/GRCh38/:/home/data_repo/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/viral_data/hbv/:/home/repo/data/ -v /mnt/d/PROJECT/code/fastvifi/1/output:/home/output sarajava/fastvifi:v1.1 python /home/fastvifi/run_kraken_vifi_pipeline.py --kraken-path /home/kraken2/kraken2 --vifi-path /home/ViFi/scripts/run_vifi.py --output /home/output --human-chr-list /home/data_repo/GRCh38/chrom_list.txt --kraken-db-path /home/kraken2-db --docker --virus hbv --input-file /home/input/S9fp.bam --skip-bwa-filter ' returned non-zero exit status 1.

and then i try in the docker to run the command in the docker, so i run docker first:

docker run -t -i -v /mnt/d/PROJECT/code/fastvifi/1:/data sarajava/fastvifi:v1.1 /bin/bash

python ./run_kraken_vifi_pipeline.py --kraken-path /home/kraken2/kraken2 --vifi-path /home/ViFi/scripts/run_vifi.py --output /data/output --human-chr-list /home/data_repo/GRCh38/chrom_list.txt --kraken-db-path /data/kraken2-master/ --level sample-level --virus hbv --input-file /data/wgs/input/S9fp.bam --skip-bwa-filter

Warning! output directory already exists. Rewriting previous outputs. Loading database information... done. 0 sequences (0.00 Mbp) processed in 0.000s (0.0 Kseq/m, 0.00 Mbp/m). 0 sequences classified (-nan%) 0 sequences unclassified (-nan%) Command being timed: "/home/kraken2/kraken2 --use-names --report /dev/null --db /data/kraken2-master/Kraken2StandardDB_k_25_hbv_hg --threads 1 --paired --f-threshold 0.4 --keep-unmapped-reads --unmapped-threshold 0.8 --classified-out /data/output/reads_passing_kraken_first_level_for_virus_hbv#.fq /data/output/reads_passing_bwa_filter_1.fq /data/output/reads_passing_bwa_filter_2.fq --output /dev/null" User time (seconds): 0.00 System time (seconds): 1.16 Percent of CPU this job got: 4% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:25.83 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 4360356 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 3760 Voluntary context switches: 17072 Involuntary context switches: 3 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 kraken2: database ("/data/kraken2-master/Kraken2StandardDB_k_22_hbv") does not contain necessary file taxo.k2d Command exited with non-zero status 2 Command being timed: "/home/kraken2/kraken2 --use-names --report /dev/null --db /data/kraken2-master/Kraken2StandardDB_k_22_hbv --threads 1 --paired --f-threshold 0.5 --unmapped-threshold 0.9 --classified-out /data/output/reads_passing_kraken_filter_for_virus_hbv#.fq /data/output/reads_passing_kraken_first_level_for_virus_hbv_1.fq /data/output/reads_passing_kraken_first_level_for_virus_hbv_2.fq --output /dev/null" User time (seconds): 0.01 System time (seconds): 0.00 Percent of CPU this job got: 92% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 6916 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 848 Voluntary context switches: 5 Involuntary context switches: 0 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 2 Traceback (most recent call last): File "./run_kraken_vifi_pipeline.py", line 491, in run_pipeline(args) File "./run_kraken_vifi_pipeline.py", line 482, in run_pipeline bwa_filtered_fq_filename_2=bwa_filtered_fq_filename_2) File "./run_kraken_vifi_pipeline.py", line 285, in run_kraken_vifi "--output {} ".format(final_level_kraken_output), shell=True) File "/usr/lib/python3.6/subprocess.py", line 356, in check_output **kwargs).stdout File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '/usr/bin/time -v /home/kraken2/kraken2 --use-names --report /dev/null --db /data/kraken2-master/Kraken2StandardDB_k_22_hbv --threads 1 --paired --f-threshold 0.5 --unmapped-threshold 0.9 --classified-out /data/output/reads_passing_kraken_filter_for_virus_hbv#.fq /data/output/reads_passing_kraken_first_level_for_virus_hbv_1.fq /data/output/reads_passing_kraken_first_level_for_virus_hbv_2.fq --output /dev/null ' returned non-zero exit status 2.

cubense commented 6 months ago

Hi Cubense,

To better understand the problem, could you please answer the following questions?

  1. Could you please share the exact command you used when getting this error?
  2. To be more precise, are you using Docker or Singularity?
  3. Is the error about existing file a warning or a fatal error? I did not have this as a fatal error stopping the program. So I'm wondering if this was a warning and another error stopped the program or if this was the error stopping the program. To be sure, maybe you can share the full output here.

Best, Sara

Dear @sara-javadzadeh , my code and the errors have been pasted above, i think the there were some problem not only in HBV database building, but also in docker process.
what can i do to debug thank you cubense

sara-javadzadeh commented 6 months ago

Hi Cubense,

Thanks for providing all the information. For building custom Kraken databases, I recommend running build_custom_kraken_database.sh instead of kraken2-build which is further explained in the custom Kraken database section of readme . The reason being that you have to run kraken2-build with --download-library and --download-taxonomy before running it with --build --db. That is why there is a script (build_custom_kraken_database.sh) in my forked kraken2 repository taking care of all the required commands needed before building a kraken database.

A quick note that build_custom_kraken_index.sh is meant to be called by build_custom_kraken_database.sh. Seems like you have been calling build_custom_kraken_index.sh instead of build_custom_kraken_database.sh which skips a few steps.

Please try running build_custom_kraken_database.sh as instructed in the custom Kraken database section of readme and let me know if you could successfully build Kraken2StandardDB_k_22_hbv.

With regard to the error when running run_kraken_vifi_container.py, thanks for sharing the output. I can see the subprocess.CalledProcessError which is probably a similar error that you get when running run_kraken_vifi_pipeline.py within a docker environment. It could be because the kraken database is not created, so the intermediate files are empty. I think it might be best to first fix the database problem and then run the pipeline again.

Best, Sara

cubense commented 6 months ago

Hi Cubense,

Thanks for providing all the information. For building custom Kraken databases, I recommend running instead of which is further explained in the custom Kraken database section of readme . The reason being that you have to run with and before running it with . That is why there is a script (build_custom_kraken_database.sh) in my forked kraken2 repository taking care of all the required commands needed before building a kraken database.build_custom_kraken_database.sh ``kraken2-build``kraken2-build``--download-library``--download-taxonomy``--build --db

A quick note that is meant to be called by . Seems like you have been calling instead of which skips a few steps.build_custom_kraken_index.sh``build_custom_kraken_database.sh``build_custom_kraken_index.sh``build_custom_kraken_database.sh

Please try running as instructed in the custom Kraken database section of readme and let me know if you could successfully build .build_custom_kraken_database.sh``Kraken2StandardDB_k_22_hbv

With regard to the error when running , thanks for sharing the output. I can see the subprocess.CalledProcessError which is probably a similar error that you get when running within a docker environment. It could be because the kraken database is not created, so the intermediate files are empty. I think it might be best to first fix the database problem and then run the pipeline again.run_kraken_vifi_container.py``run_kraken_vifi_pipeline.py

Best, Sara

hi Sara, Thank you for your response. The command I previously used to build the database was build_custom_kraken_database.sh hbv ./hbv.unaligned.fas (or hbv.aligned.fasta) 9000000 and database Kraken2StandardDB_k_25_hbv_hg have included three *.k2d files but database Kraken2StandardDB_k_22_hbv and Kraken2StandardDB_k_18_hbv , there are only two folders including library and taxonomy, and three .k2d files are missing. I have run this command many times, and each time I get the same result without any errors or warnings. I can only successfully build one of three, and it's quite puzzling. It aligns with the issue #9 ,I described above. If possible, could you please upload the constructed hbv database? I would greatly appreciate it. look forward your reply. thanks cubense

sara-javadzadeh commented 6 months ago

Hi Cubense,

I'm working on compressing the kraken databases and uploading them somewhere you can download. It might take a while because of the size of the files and my not so great internet connection. Stay tuned for that. I also noticed that the hbv_hg kraken database where you could run kraken2-inspect on successfully, does not contain HBV sequences. Ideally, in the output of kraken2-inspect I want to see a taxonomy tree (with all the intermediate nodes) for human and HBV. The taxonomy tree you printed here only includes human and does not include HBV. If I had to guess, there might be something wrong with the HBV reference sequences you pass to the build_custom_kraken_database.sh script. The file should contain unaligned HBV reference sequences with descriptive names for each sequence. Please re-try with this hbv reference file, to make sure the viral sequence names are correctly identified. If you have added more references than are in the reference above, feel free to add them, but please make sure the sequence id follows the format as is in the file.

Best, Sara

sara-javadzadeh commented 6 months ago

Hi Cubense,

Please find Kraken2StandardDB_k_22_hbv and Kraken2StandardDB_k_18_hbv compressed in a single file (11.23GB) here. The Kraken2StandardDB_k_25_hbv_hg database is lacking hbv genomes from what I see at kraken2-inspect output that you shared above. But FastViFi algorithm should work without corrected Kraken2StandardDB_k_25_hbv_hg in theory, although the sensitivity might be negatively affected. I say in theory because I have not tested FastViFi with an absent viral genome in Kraken2StandardDB_k_25_hbv_hg myself. Please go through my previous comment and this comment and let me know if it works for you.

cubense commented 6 months ago

Hi Cubense,

I'm working on compressing the kraken databases and uploading them somewhere you can download. It might take a while because of the size of the files and my not so great internet connection. Stay tuned for that. I also noticed that the hbv_hg kraken database where you could run kraken2-inspect on successfully, does not contain HBV sequences. Ideally, in the output of kraken2-inspect I want to see a taxonomy tree (with all the intermediate nodes) for human and HBV. The taxonomy tree you printed here only includes human and does not include HBV. If I had to guess, there might be something wrong with the HBV reference sequences you pass to the build_custom_kraken_database.sh script. The file should contain unaligned HBV reference sequences with descriptive names for each sequence. Please re-try with this hbv reference file, to make sure the viral sequence names are correctly identified. If you have added more references than are in the reference above, feel free to add them, but please make sure the sequence id follows the format as is in the file.

Best, Sara

Dear Sara @sara-javadzadeh , Thank you for your efforts! I can build 3 hbv database using your unaligned HBV reference sequences with descriptive names. but i got the same error when i run fastvifi in docker:

python run_kraken_vifi_container.py --docker --input-file /mnt/d/PROJECT/code/fastvifi/1/input/S9a.bam --output-dir /mnt/d/PROJECT/code/fastvifi/1/output --virus hbv --kraken-db-path /mnt/d/PROJECT/code/fastvifi/1/kraken2-master --vifi-viral-ref-dir /mnt/d/PROJECT/code/fastvifi/1/viral_data/hbv --human-chr-list

/mnt/d/PROJECT/code/fastvifi/1/data_repo/GRCh38 --vifi-human-ref-dir /mnt/d/PROJECT/code/fastvifi/1/data_repo Changing the mode of the output directory to be writable by other users (i.e., the user in the docker container) docker run --rm --read-only -v /mnt/d/PROJECT/code/fastvifi/1/input:/home/input/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/kraken2-master:/home/kraken2-db --read-only -v /mnt/d/PROJECT/code/fastvifi/1/data_repo:/home/data_repo/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/viral_data/hbv:/home/repo/data/ -v /mnt/d/PROJECT/code/fastvifi/1/output:/home/output sarajava/fastvifi:v1.1 python /home/fastvifi/run_kraken_vifi_pipeline.py --kraken-path /home/kraken2/kraken2 --vifi-path /home/ViFi/scripts/run_vifi.py --output /home/output --human-chr-list /home/data_repo/GRCh38/chrom_list.txt --kraken-db-path /home/kraken2-db --docker --virus hbv --input-file /home/input/S9a.bam /bin/sh: 1: cannot create output: Read-only file system python: can't open file 'filter_reads_bwa_efficient.py': [Errno 2] No such file or directory Command exited with non-zero status 2 Command being timed: "python filter_reads_bwa_efficient.py /home/input/S9a.bam /home/output /home/data_repo/GRCh38/chrom_list.txt reads_passing_bwa_filter" User time (seconds): 0.01 System time (seconds): 0.00 Percent of CPU this job got: 100% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 8912 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1 Minor (reclaiming a frame) page faults: 1086 Voluntary context switches: 1 Involuntary context switches: 0 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 2 Traceback (most recent call last): File "/home/fastvifi/run_kraken_vifi_pipeline.py", line 491, in run_pipeline(args) File "/home/fastvifi/run_kraken_vifi_pipeline.py", line 448, in run_pipeline args.input_file, args.output_dir, human_chr_list, bwa_filtered_filename_prefix), shell=True) File "/usr/lib/python3.6/subprocess.py", line 356, in check_output *kwargs).stdout File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '/usr/bin/time -v python filter_reads_bwa_efficient.py /home/input/S9a.bam /home/output /home/data_repo/GRCh38/chrom_list.txt reads_passing_bwa_filter &>> output' returned non-zero exit status 2. Traceback (most recent call last): File "run_kraken_vifi_container.py", line 95, in call_fastvifi_pipeline(args) File "run_kraken_vifi_container.py", line 88, in call_fastvifi_pipeline shell_output = subprocess.check_output( File "/home/paddling/miniconda3/envs/viralngs/lib/python3.8/subprocess.py", line 415, in check_output return run(popenargs, stdout=PIPE, timeout=timeout, check=True, File "/home/paddling/miniconda3/envs/viralngs/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'docker run --rm --read-only -v /mnt/d/PROJECT/code/fastvifi/1/input:/home/input/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/kraken2-master:/home/kraken2-db --read-only -v /mnt/d/PROJECT/code/fastvifi/1/data_repo:/home/data_repo/ --read-only -v /mnt/d/PROJECT/code/fastvifi/1/viral_data/hbv:/home/repo/data/ -v /mnt/d/PROJECT/code/fastvifi/1/output:/home/output sarajava/fastvifi:v1.1 python /home/fastvifi/run_kraken_vifi_pipeline.py --kraken-path /home/kraken2/kraken2 --vifi-path /home/ViFi/scripts/run_vifi.py --output /home/output --human-chr-list /home/data_repo/GRCh38/chrom_list.txt --kraken-db-path /home/kraken2-db --docker --virus hbv --input-file /home/input/S9a.bam ' returned non-zero exit status 1.

I think the docker image may be incomplete because some file cannot download. And i have tried to pull many times, the image label show: [sarajava/fastvifi] 769aca5295e6 v1.1 4 months ago 2.7 GB so i need you confirm that if the image is 2.7GB or not. And what shoud I do if i want to debug. thank you for your help. looking forward your reply.

cubense