Closed olgabot closed 4 years ago
why is it saving inside the docker /tmp/ directory. I assumed that all the inputs/outputs and /tmp files are hosted on the local system.can we share /tmp or some directory on local machine to docker container while running the docker command like so docker run -it -v /tmp:/tmp/ imaging_docker:gpu_py36_cu90 bash
if this docker.tmp = "auto" resolved your issue, please feel free to open a PR!
Okay I jumped the gun, still running out of storage space. Oddly enough, for both the pipeline runs, this is happening around the 12-13 hour mark:
nextflow log
output2019-11-03 15:21:01 13h 8m 46s cranky_almeida ERR 26588f6495 ead34009-a863-4ad2-9297-d33730942da7 nextflow run nf-core/kmermaid -r olgabot/khtools-extract-coding -latest -profile czbiohub_local,docker --bam /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam --custom_config_base /home/olga/code/nf-core/configs --molecules dna,protein,dayhoff --peptide_fasta /home/olga/data_lg/czbiohub-reference/ensembl/release-97/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz --ksizes 21 --log2_sketch_sizes 10,12,14,16 --bloomfilter_tablesize 1e8 --extract_coding_peptide_ksize 11 --extract_coding_peptide_molecule dayhoff --outdir /home/olga/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ --extract_coding_jaccard_threshold 0.5 -work-dir /home/olga/pureScratch/nextflow-intermediates/ -resume
2019-11-04 10:32:00 3.5s distracted_euler ERR 26588f6495 ead34009-a863-4ad2-9297-d33730942da7 nextflow run nf-core/kmermaid -r olgabot/khtools-extract-coding -latest -profile czbiohub_local,docker --bam /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam --custom_config_base /home/olga/code/nf-core/configs --molecules dna,protein,dayhoff --peptide_fasta /home/olga/data_lg/czbiohub-reference/ensembl/release-97/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz --ksizes 21 --log2_sketch_sizes 10,12,14,16 --bloomfilter_tablesize 1e8 --extract_coding_peptide_ksize 11 --extract_coding_peptide_molecule dayhoff --outdir /home/olga/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ --extract_coding_jaccard_threshold 0.5 -work-dir /home/olga/pureScratch/nextflow-intermediates/ -resume
2019-11-04 13:09:57 - adoring_almeida - e0dddf4b52 ead34009-a863-4ad2-9297-d33730942da7 nextflow run nf-core/kmermaid -r olgabot/khtools-extract-coding -latest -profile czbiohub_local,docker --bam /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam --custom_config_base /home/olga/code/nf-core/configs --molecules dna,protein,dayhoff --peptide_fasta /home/olga/data_lg/czbiohub-reference/ensembl/release-97/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz --ksizes 21 --log2_sketch_sizes 10,12,14,16 --bloomfilter_tablesize 1e8 --extract_coding_peptide_ksize 11 --extract_coding_peptide_molecule dayhoff --outdir /home/olga/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ --extract_coding_jaccard_threshold 0.5 -work-dir /home/olga/pureScratch/nextflow-intermediates/ -resume
2019-11-04 13:25:46 - jolly_aryabhata - e0dddf4b52 ead34009-a863-4ad2-9297-d33730942da7 nextflow run nf-core/kmermaid -r olgabot/khtools-extract-coding -latest -profile czbiohub_local,docker --bam /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam --custom_config_base /home/olga/code/nf-core/configs --molecules dna,protein,dayhoff --peptide_fasta /home/olga/data_lg/czbiohub-reference/ensembl/release-97/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz --ksizes 21 --log2_sketch_sizes 10,12,14,16 --bloomfilter_tablesize 1e8 --extract_coding_peptide_ksize 11 --extract_coding_peptide_molecule dayhoff --outdir /home/olga/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ --extract_coding_jaccard_threshold 0.5 -work-dir /home/olga/pureScratch/nextflow-intermediates/ -resume
2019-11-05 04:05:28 12h 32m 45s peaceful_lalande ERR e0dddf4b52 ead34009-a863-4ad2-9297-d33730942da7 nextflow run nf-core/kmermaid -r olgabot/khtools-extract-coding -latest -profile czbiohub_local,docker --bam /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam --custom_config_base /home/olga/code/nf-core/configs --molecules dna,protein,dayhoff --peptide_fasta /home/olga/data_lg/czbiohub-reference/ensembl/release-97/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz --ksizes 21 --log2_sketch_sizes 10,12,14,16 --bloomfilter_tablesize 1e8 --extract_coding_peptide_ksize 11 --extract_coding_peptide_molecule dayhoff --outdir /home/olga/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ --extract_coding_jaccard_threshold 0.5 -work-dir /home/olga/pureScratch/nextflow-intermediates/ -resume
this looks bad! The datasets i have ran it on are all the primates, none of them took 12 hours. it was 7 or 4 or 2 hours for the primate bam files that were as big as 12 gb.
also, FWIW I have tested this on a smaller bam file and the standard profile took 18 seconds to convert to fastas and obtain a signature, while the profile docker took 2 minutes. I don't think its a memory error, there might be something else going on causing the lag on docker?
- which dataset is this?
This is also a primate :) But Mouse Lemur. The bam file is 19GB:
(kmer-hashing)
Wed 6 Nov - 15:02 ~/code/tabula-microcebus-extract-coding/workflows/kmermaid/10x origin ☊ olgabot/extract-coding-reads ↑1 4☀ 5●
olga@ndnd ll ~/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam
Permissions Size User Group Date Modified Name
.rw-r--r--@ 19G olga olga 9 Oct 2018 /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam
- what is the docker container doing here anyway? - why would it save the data inside docker container, it should be saving in the mnt directory and there should be enough space
I could install everything via conda and try that, that's true. Let me see if using -profile conda
will work.
- can we just run locally and see if it still fails/fails faster maybe there is a different error?
Yes, I'll try with -profile conda
which will build an environment from the environment.yml
file
- can you post your config file? did you change the line_count parameter? I used 1500 for mine but the default is 350
I didn't change the line count. How would that affect the space? Wouldn't the total file sizes take the same amount of space even if each individual one is smaller or bigger?
- how do you know its memory error here exactly?
The nextflow pipeline errors out specifically because of a "out of disk" space issue (full details below)
there might be overhead for bams but fastas shouldn't have any overhead, but it would be better to increase the line count to 1500 since it would be easier to combine the different temp fastas later.
I wonder if it's clearing the space after failing. when the container fails is it creating a new container, or if it's using the mounted directory, maybe delete temp fastas there?
@olgabot did /tmp/
fix the issue here or running it out of docker fix the issue. post it here when you can, if this is not used right now, don't worry about it!
using samtools to convert bam to fastq.gz now and then using Olga's code for counting and makeing fastqs per cell in the bam2fasta - this is no longer an error, closing
@pranathivemuri Let me know if this is the right place to file this, or whether this should be in https://github.com/czbiohub/bam2fasta/issues/
Using
bam2fasta
in this pipeline, the docker image runs out of space:Looking at this error (specifically the solution on GitHub here), it can be resolved by setting
docker.temp = "auto"
which creates a new temporary directory for every image, as specified in the Nextflow Docker Scope.