nf-core / kmermaid

k-mer similarity analysis pipeline
https://nf-co.re/kmermaid
MIT License
19 stars 11 forks source link

No space left on device for bam #24

Closed olgabot closed 4 years ago

olgabot commented 5 years ago

@pranathivemuri Let me know if this is the right place to file this, or whether this should be in https://github.com/czbiohub/bam2fasta/issues/

Using bam2fasta in this pipeline, the docker image runs out of space:

``` [0;35m[nf-core/kmermaid] Pipeline completed with errors ERROR ~ Error executing process > 'sourmash_compute_sketch_bam (possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10)' Caused by: Process `sourmash_compute_sketch_bam (possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10)` terminated with an error exit status (1) Command executed: sourmash compute \ --ksize 21 \ --dna \ \ --num-hashes $((2**10)) \ --processes 32 \ --count-valid-reads 0 \ --line-count 350 \ \ \ --save-fastas . \ \ --output possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10.sig \ --input-is-10x possorted_genome_bam.bam find . -type f -name "*.fasta" | while read src; do if [[ $src == *"|"* ]]; then mv "$src" $(echo "$src" | tr "|" "_"); fi done Command exit status: 1 Command output: (empty) Command error: 385018218it [5:24:57, 19305.21it/s] 385020154it [5:24:57, 18632.53it/s] 385022079it [5:24:57, 18707.53it/s] 385023956it [5:24:57, 18336.06it/s] 385025940it [5:24:57, 18724.69it/s] 385027819it [5:24:57, 18663.82it/s] 385029802it [5:24:57, 18631.38it/s] 385031877it [5:24:57, 19141.33it/s] 385033797it [5:24:57, 18604.88it/s] 385035760it [5:24:58, 18900.91it/s] 385037657it [5:24:58, 18336.40it/s] 385039630it [5:24:58, 18558.35it/s] 385041705it [5:24:58, 19165.53it/s] 385043631it [5:24:58, 18932.62it/s] 385045532it [5:24:58, 18821.15it/s] 385047420it [5:24:58, 18601.51it/s] 385049457it [5:24:58, 19039.14it/s] 385051193it [5:24:58, 19747.42it/s] multiprocess.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar return list(map(*args)) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/cli.py", line 289, in lambda x: func(x.encode('utf-8')), File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/tenx_utils.py", line 213, in bam_to_temp_fasta filenames = list(set(write_cell_sequences(cell_sequences, delimiter))) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/tenx_utils.py", line 249, in write_cell_sequences with open(filename, "a") as f: OSError: [Errno 28] No space left on device: '/tmp/tmpoxumgy9g/CGCCAAGAGAGTAATC-1.fasta' """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/envs/nfcore-kmermaid-0.1dev/bin/sourmash", line 8, in sys.exit(main()) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/sourmash/__main__.py", line 83, in main cmd(sys.argv[2:]) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/sourmash/command_compute.py", line 309, in compute fastas = bam2fasta_cli.convert(bam_to_fasta_args) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/cli.py", line 290, in convert filenames, chunksize=chunksize)))) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 320, in return (item for chunk in result for item in chunk) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 735, in next raise value OSError: [Errno 28] No space left on device: '/tmp/tmpoxumgy9g/CGCCAAGAGAGTAATC-1.fasta' Work dir: /home/olga/pureScratch/nextflow-intermediates/1c/b4f769b576791ee0f5a78e44ad72f5 Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option `-resume` -- Check '.nextflow.log' file for details ```

Looking at this error (specifically the solution on GitHub here), it can be resolved by setting docker.temp = "auto" which creates a new temporary directory for every image, as specified in the Nextflow Docker Scope.

pranathivemuri commented 5 years ago

why is it saving inside the docker /tmp/ directory. I assumed that all the inputs/outputs and /tmp files are hosted on the local system.can we share /tmp or some directory on local machine to docker container while running the docker command like so docker run -it -v /tmp:/tmp/ imaging_docker:gpu_py36_cu90 bash

pranathivemuri commented 5 years ago

if this docker.tmp = "auto" resolved your issue, please feel free to open a PR!

olgabot commented 5 years ago

Okay I jumped the gun, still running out of storage space. Oddly enough, for both the pipeline runs, this is happening around the 12-13 hour mark:

nextflow log output

2019-11-03 15:21:01     13h 8m 46s      cranky_almeida          ERR     26588f6495      ead34009-a863-4ad2-9297-d33730942da7    nextflow run nf-core/kmermaid -r olgabot/khtools-extract-coding -latest -profile czbiohub_local,docker --bam /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam --custom_config_base /home/olga/code/nf-core/configs --molecules dna,protein,dayhoff --peptide_fasta /home/olga/data_lg/czbiohub-reference/ensembl/release-97/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz --ksizes 21 --log2_sketch_sizes 10,12,14,16 --bloomfilter_tablesize 1e8 --extract_coding_peptide_ksize 11 --extract_coding_peptide_molecule dayhoff --outdir /home/olga/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ --extract_coding_jaccard_threshold 0.5 -work-dir /home/olga/pureScratch/nextflow-intermediates/ -resume
2019-11-04 10:32:00     3.5s            distracted_euler        ERR     26588f6495      ead34009-a863-4ad2-9297-d33730942da7    nextflow run nf-core/kmermaid -r olgabot/khtools-extract-coding -latest -profile czbiohub_local,docker --bam /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam --custom_config_base /home/olga/code/nf-core/configs --molecules dna,protein,dayhoff --peptide_fasta /home/olga/data_lg/czbiohub-reference/ensembl/release-97/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz --ksizes 21 --log2_sketch_sizes 10,12,14,16 --bloomfilter_tablesize 1e8 --extract_coding_peptide_ksize 11 --extract_coding_peptide_molecule dayhoff --outdir /home/olga/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ --extract_coding_jaccard_threshold 0.5 -work-dir /home/olga/pureScratch/nextflow-intermediates/ -resume
2019-11-04 13:09:57     -               adoring_almeida         -       e0dddf4b52      ead34009-a863-4ad2-9297-d33730942da7    nextflow run nf-core/kmermaid -r olgabot/khtools-extract-coding -latest -profile czbiohub_local,docker --bam /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam --custom_config_base /home/olga/code/nf-core/configs --molecules dna,protein,dayhoff --peptide_fasta /home/olga/data_lg/czbiohub-reference/ensembl/release-97/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz --ksizes 21 --log2_sketch_sizes 10,12,14,16 --bloomfilter_tablesize 1e8 --extract_coding_peptide_ksize 11 --extract_coding_peptide_molecule dayhoff --outdir /home/olga/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ --extract_coding_jaccard_threshold 0.5 -work-dir /home/olga/pureScratch/nextflow-intermediates/ -resume
2019-11-04 13:25:46     -               jolly_aryabhata         -       e0dddf4b52      ead34009-a863-4ad2-9297-d33730942da7    nextflow run nf-core/kmermaid -r olgabot/khtools-extract-coding -latest -profile czbiohub_local,docker --bam /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam --custom_config_base /home/olga/code/nf-core/configs --molecules dna,protein,dayhoff --peptide_fasta /home/olga/data_lg/czbiohub-reference/ensembl/release-97/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz --ksizes 21 --log2_sketch_sizes 10,12,14,16 --bloomfilter_tablesize 1e8 --extract_coding_peptide_ksize 11 --extract_coding_peptide_molecule dayhoff --outdir /home/olga/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ --extract_coding_jaccard_threshold 0.5 -work-dir /home/olga/pureScratch/nextflow-intermediates/ -resume
2019-11-05 04:05:28     12h 32m 45s     peaceful_lalande        ERR     e0dddf4b52      ead34009-a863-4ad2-9297-d33730942da7    nextflow run nf-core/kmermaid -r olgabot/khtools-extract-coding -latest -profile czbiohub_local,docker --bam /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam --custom_config_base /home/olga/code/nf-core/configs --molecules dna,protein,dayhoff --peptide_fasta /home/olga/data_lg/czbiohub-reference/ensembl/release-97/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz --ksizes 21 --log2_sketch_sizes 10,12,14,16 --bloomfilter_tablesize 1e8 --extract_coding_peptide_ksize 11 --extract_coding_peptide_molecule dayhoff --outdir /home/olga/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ --extract_coding_jaccard_threshold 0.5 -work-dir /home/olga/pureScratch/nextflow-intermediates/ -resume
pranathivemuri commented 5 years ago

this looks bad! The datasets i have ran it on are all the primates, none of them took 12 hours. it was 7 or 4 or 2 hours for the primate bam files that were as big as 12 gb.

  1. which dataset is this?
  2. what is the docker container doing here anyway? - why would it save the data inside docker container, it should be saving in the mnt directory and there should be enough space
  3. can we just run locally and see if it still fails/fails faster maybe there is a different error?
  4. can you post your config file? did you change the line_count parameter? I used 1500 for mine but the default is 350
  5. how do you know its memory error here exactly?
pranathivemuri commented 5 years ago

also, FWIW I have tested this on a smaller bam file and the standard profile took 18 seconds to convert to fastas and obtain a signature, while the profile docker took 2 minutes. I don't think its a memory error, there might be something else going on causing the lag on docker?

olgabot commented 5 years ago
  1. which dataset is this?

This is also a primate :) But Mouse Lemur. The bam file is 19GB:

(kmer-hashing)
 Wed  6 Nov - 15:02  ~/code/tabula-microcebus-extract-coding/workflows/kmermaid/10x   origin ☊ olgabot/extract-coding-reads ↑1 4☀ 5● 
 olga@ndnd  ll ~/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam
Permissions Size User Group Date Modified Name
.rw-r--r--@  19G olga olga   9 Oct  2018  /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam
  1. what is the docker container doing here anyway? - why would it save the data inside docker container, it should be saving in the mnt directory and there should be enough space

I could install everything via conda and try that, that's true. Let me see if using -profile conda will work.

  1. can we just run locally and see if it still fails/fails faster maybe there is a different error?

Yes, I'll try with -profile conda which will build an environment from the environment.yml file

  1. can you post your config file? did you change the line_count parameter? I used 1500 for mine but the default is 350

I didn't change the line count. How would that affect the space? Wouldn't the total file sizes take the same amount of space even if each individual one is smaller or bigger?

  1. how do you know its memory error here exactly?

The nextflow pipeline errors out specifically because of a "out of disk" space issue (full details below)

Gory nextflow log details

``` (kmer-hashing) ✘  Tue 5 Nov - 04:05  ~/code/tabula-microcebus-extract-coding/workflows/kmermaid/10x   origin ☊ olgabot/extract-coding-reads 3☀ 6●  olga@ndnd  (kmer-hashing) ✘  Tue 5 Nov - 04:05  ~/code/tabula-microcebus-extract-coding/workflows/kmermaid/10x   origin ☊ olgabot/extract-coding-reads 3☀ 6●  olga@ndnd  sudo make run [sudo] password for olga: nextflow run \ nf-core/kmermaid \ -r olgabot/khtools-extract-coding \ -latest \ -profile czbiohub_local,docker \ --bam ~/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam \ --custom_config_base ~/code/nf-core/configs \ --molecules dna,protein,dayhoff \ --peptide_fasta ~/data_lg/czbiohub-reference/ensembl/release-97/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz \ --ksizes 21 \ --log2_sketch_sizes 10,12,14,16 \ --bloomfilter_tablesize '1e8' \ --extract_coding_peptide_ksize 11 \ --extract_coding_peptide_molecule dayhoff \ --outdir ~/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ \ --extract_coding_jaccard_threshold 0.5 \ -work-dir ~/pureScratch/nextflow-intermediates/ \ -resume N E X T F L O W ~ version 19.04.1 Pulling nf-core/kmermaid ... Already-up-to-date Launching `nf-core/kmermaid` [peaceful_lalande] - revision: e0dddf4b52 [olgabot/khtools-extract-coding] WARN: There's no process matching config selector: sourmash_compute_sketch_fastx -- Did you mean: sourmash_compute_sketch_bam? [2m---------------------------------------------------- ,--./,-. ___ __ __ __ ___ /,-._.--~' |\ | |__ __ / ` / \ |__) |__ } { | \| | \__, \__/ | \ |___ \`-._,-`-, `._,._,' nf-core/kmer-similarity v1.0dev ---------------------------------------------------- Pipeline Release : olgabot/khtools-extract-coding Run Name : peaceful_lalande BAM : /home/olga/data_sm/tabula-microcebus/rawdata/aligned_micmur3/10x/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/outs/possorted_genome_bam.bam K-mer sizes : 21 Molecule : dna,protein,dayhoff Log2 Sketch Sizes : 10,12,14,16 One Sig per Record: false Bam chunk line count: 350 Count valid reads : 0 Saved Fastas : fastas Barcode umi read metadata: false Max Resources : 512 GB memory, 32 cpus, 10d time per job Container : docker - nfcore/kmermaid:dev Output dir : /home/olga/data_sm/tabula-microcebus/kmermaid/dayhoff_ksize11/antoine__180917_A00111_0211_AHGKCVDMXX/ANTOINE_LUNG_P3/ANTOINE_LUNG_P3/ Launch dir : /home/olga/code/tabula-microcebus-extract-coding/workflows/kmermaid/10x Working dir : /home/olga/pureScratch/nextflow-intermediates Script dir : /home/olga/.nextflow/assets/nf-core/kmermaid User : root Config Profile : czbiohub_local,docker Config Description: Chan Zuckerberg Biohub AWS Batch profile provided by nf-core/configs. Config Contact : Olga Botvinnik (@olgabot) Config URL : https://www.czbiohub.org/ [0m---------------------------------------------------- [warm up] executor > local WARN: Unknown directive `one_signature_per_record` for process `sourmash_compute_sketch_bam` executor > local (3) [7c/1641d2] process > peptide_bloom_filter [100%] 1 of 1, cached: 1 ✔ [dd/9d5058] process > get_software_versions [100%] 1 of 1 ✔ [2d/0064e2] process > sourmash_compute_sketch_bam [ 50%] 1 of 2, failed: 1 ERROR ~ Error executing process > 'sourmash_compute_sketch_bam (possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10)' Caused by: Process `sourmash_compute_sketch_bam (possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10)` terminated with an error exit status (1) Command executed: sourmash compute \ --ksize 21 \ --dna \ \ --num-hashes $((2**10)) \ --processes 32 \ --count-valid-reads 0 \ --line-count 350 \ \ \ --save-fastas . \ \ --output possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10.sig \ --input-is-10x possorted_genome_bam.bam find . -type f -name "*.fasta" | while read src; do if [[ $src == *"|"* ]]; then mv "$src" $(echo "$src" | tr "|" "_"); fi done Command exit status: 1 Command output: (empty) Command error: 385023996it [5:35:52, 18770.36it/s] 385025910it [5:35:52, 18856.85it/s] 385027802it [5:35:52, 18603.13it/s] 385029668it [5:35:52, 18389.38it/s] 385031556it [5:35:53, 18407.34it/s] 385033662it [5:35:53, 18989.76it/s] 385035568it [5:35:53, 18220.21it/s] 385037523it [5:35:53, 18479.58it/s] 385039380it [5:35:53, 17993.33it/s] 385041384it [5:35:53, 18412.79it/s] 385043234it [5:35:53, 17859.51it/s] 385045215it [5:35:53, 18376.91it/s] 385047064it [5:35:53, 17924.61it/s] 385049106it [5:35:54, 18516.98it/s] 385050970it [5:35:54, 18231.62it/s] 385051193it [5:35:54, 19105.31it/s] multiprocess.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar return list(map(*args)) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/cli.py", line 289, in lambda x: func(x.encode('utf-8')), File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/tenx_utils.py", line 213, in bam_to_temp_fasta filenames = list(set(write_cell_sequences(cell_sequences, delimiter))) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/tenx_utils.py", line 242, in write_cell_sequences temp_folder = tempfile.mkdtemp() executor > local (3) [7c/1641d2] process > peptide_bloom_filter [100%] 1 of 1, cached: 1 ✔ [dd/9d5058] process > get_software_versions [100%] 1 of 1 ✔ [2d/0064e2] process > sourmash_compute_sketch_bam [100%] 2 of 2, failed: 2 ✘ ERROR ~ Error executing process > 'sourmash_compute_sketch_bam (possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10)' Caused by: Process `sourmash_compute_sketch_bam (possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10)` terminated with an error exit status (1) Command executed: sourmash compute \ --ksize 21 \ --dna \ \ --num-hashes $((2**10)) \ --processes 32 \ --count-valid-reads 0 \ --line-count 350 \ \ \ --save-fastas . \ \ --output possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10.sig \ --input-is-10x possorted_genome_bam.bam find . -type f -name "*.fasta" | while read src; do if [[ $src == *"|"* ]]; then mv "$src" $(echo "$src" | tr "|" "_"); fi done Command exit status: 1 Command output: (empty) Command error: 385023996it [5:35:52, 18770.36it/s] 385025910it [5:35:52, 18856.85it/s] 385027802it [5:35:52, 18603.13it/s] 385029668it [5:35:52, 18389.38it/s] 385031556it [5:35:53, 18407.34it/s] 385033662it [5:35:53, 18989.76it/s] 385035568it [5:35:53, 18220.21it/s] 385037523it [5:35:53, 18479.58it/s] 385039380it [5:35:53, 17993.33it/s] 385041384it [5:35:53, 18412.79it/s] 385043234it [5:35:53, 17859.51it/s] 385045215it [5:35:53, 18376.91it/s] 385047064it [5:35:53, 17924.61it/s] 385049106it [5:35:54, 18516.98it/s] 385050970it [5:35:54, 18231.62it/s] 385051193it [5:35:54, 19105.31it/s] multiprocess.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar return list(map(*args)) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/cli.py", line 289, in lambda x: func(x.encode('utf-8')), File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/tenx_utils.py", line 213, in bam_to_temp_fasta filenames = list(set(write_cell_sequences(cell_sequences, delimiter))) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/tenx_utils.py", line 242, in write_cell_sequences temp_folder = tempfile.mkdtemp() executor > local (3) [7c/1641d2] process > peptide_bloom_filter [100%] 1 of 1, cached: 1 ✔ [dd/9d5058] process > get_software_versions [100%] 1 of 1 ✔ [2d/0064e2] process > sourmash_compute_sketch_bam [100%] 2 of 2, failed: 2 ✘ [0;35m[nf-core/kmermaid] Pipeline completed with errors ERROR ~ Error executing process > 'sourmash_compute_sketch_bam (possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10)' Caused by: Process `sourmash_compute_sketch_bam (possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10)` terminated with an error exit status (1) Command executed: sourmash compute \ --ksize 21 \ --dna \ \ --num-hashes $((2**10)) \ --processes 32 \ --count-valid-reads 0 \ --line-count 350 \ \ \ --save-fastas . \ \ --output possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10.sig \ --input-is-10x possorted_genome_bam.bam find . -type f -name "*.fasta" | while read src; do if [[ $src == *"|"* ]]; then mv "$src" $(echo "$src" | tr "|" "_"); fi done Command exit status: 1 Command output: (empty) Command error: 385023996it [5:35:52, 18770.36it/s] 385025910it [5:35:52, 18856.85it/s] 385027802it [5:35:52, 18603.13it/s] 385029668it [5:35:52, 18389.38it/s] 385031556it [5:35:53, 18407.34it/s] 385033662it [5:35:53, 18989.76it/s] 385035568it [5:35:53, 18220.21it/s] 385037523it [5:35:53, 18479.58it/s] 385039380it [5:35:53, 17993.33it/s] 385041384it [5:35:53, 18412.79it/s] 385043234it [5:35:53, 17859.51it/s] 385045215it [5:35:53, 18376.91it/s] 385047064it [5:35:53, 17924.61it/s] 385049106it [5:35:54, 18516.98it/s] 385050970it [5:35:54, 18231.62it/s] 385051193it [5:35:54, 19105.31it/s] multiprocess.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar return list(map(*args)) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/cli.py", line 289, in lambda x: func(x.encode('utf-8')), File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/tenx_utils.py", line 213, in bam_to_temp_fasta filenames = list(set(write_cell_sequences(cell_sequences, delimiter))) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/tenx_utils.py", line 242, in write_cell_sequences temp_folder = tempfile.mkdtemp() executor > local (3) [7c/1641d2] process > peptide_bloom_filter [100%] 1 of 1, cached: 1 ✔ [dd/9d5058] process > get_software_versions [100%] 1 of 1 ✔ [2d/0064e2] process > sourmash_compute_sketch_bam [100%] 2 of 2, failed: 2 ✘ [0;35m[nf-core/kmermaid] Pipeline completed with errors ERROR ~ Error executing process > 'sourmash_compute_sketch_bam (possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10)' Caused by: Process `sourmash_compute_sketch_bam (possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10)` terminated with an error exit status (1) Command executed: sourmash compute \ --ksize 21 \ --dna \ \ --num-hashes $((2**10)) \ --processes 32 \ --count-valid-reads 0 \ --line-count 350 \ \ \ --save-fastas . \ \ --output possorted_genome_bam_molecule-dna_ksize-21_log2sketchsize-10.sig \ --input-is-10x possorted_genome_bam.bam find . -type f -name "*.fasta" | while read src; do if [[ $src == *"|"* ]]; then mv "$src" $(echo "$src" | tr "|" "_"); fi done Command exit status: 1 Command output: (empty) Command error: 385023996it [5:35:52, 18770.36it/s] 385025910it [5:35:52, 18856.85it/s] 385027802it [5:35:52, 18603.13it/s] 385029668it [5:35:52, 18389.38it/s] 385031556it [5:35:53, 18407.34it/s] 385033662it [5:35:53, 18989.76it/s] 385035568it [5:35:53, 18220.21it/s] 385037523it [5:35:53, 18479.58it/s] 385039380it [5:35:53, 17993.33it/s] 385041384it [5:35:53, 18412.79it/s] 385043234it [5:35:53, 17859.51it/s] 385045215it [5:35:53, 18376.91it/s] 385047064it [5:35:53, 17924.61it/s] 385049106it [5:35:54, 18516.98it/s] 385050970it [5:35:54, 18231.62it/s] 385051193it [5:35:54, 19105.31it/s] multiprocess.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar return list(map(*args)) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/cli.py", line 289, in lambda x: func(x.encode('utf-8')), File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/tenx_utils.py", line 213, in bam_to_temp_fasta filenames = list(set(write_cell_sequences(cell_sequences, delimiter))) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/tenx_utils.py", line 242, in write_cell_sequences temp_folder = tempfile.mkdtemp() File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/tempfile.py", line 370, in mkdtemp _os.mkdir(file, 0o700) OSError: [Errno 28] No space left on device: '/tmp/tmpfvu01q73' """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/envs/nfcore-kmermaid-0.1dev/bin/sourmash", line 8, in sys.exit(main()) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/sourmash/__main__.py", line 83, in main cmd(sys.argv[2:]) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/sourmash/command_compute.py", line 309, in compute fastas = bam2fasta_cli.convert(bam_to_fasta_args) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/bam2fasta/cli.py", line 290, in convert filenames, chunksize=chunksize)))) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 320, in return (item for chunk in result for item in chunk) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/multiprocess/pool.py", line 735, in next raise value OSError: [Errno 28] No space left on device: '/tmp/tmpfvu01q73' Work dir: /home/olga/pureScratch/nextflow-intermediates/2d/0064e275a31e62e401f3e03d7fc1f4 Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run` -- Check '.nextflow.log' file for details Makefile:5: recipe for target 'run' failed ```
pranathivemuri commented 5 years ago

there might be overhead for bams but fastas shouldn't have any overhead, but it would be better to increase the line count to 1500 since it would be easier to combine the different temp fastas later.

pranathivemuri commented 5 years ago

I wonder if it's clearing the space after failing. when the container fails is it creating a new container, or if it's using the mounted directory, maybe delete temp fastas there?

pranathivemuri commented 4 years ago

@olgabot did /tmp/ fix the issue here or running it out of docker fix the issue. post it here when you can, if this is not used right now, don't worry about it!

pranathivemuri commented 4 years ago

using samtools to convert bam to fastq.gz now and then using Olga's code for counting and makeing fastqs per cell in the bam2fasta - this is no longer an error, closing