Closed ivelsko closed 2 years ago
@maxibor for you, but I can't seem to assign you
@maxibor more specifically it seems this ends up using a ridiculous amount of memory, so I think it would require optimisation on part of the script.
@maxibor also posted the following on slack:
https://github.com/TOAST-sandbox/podPeople https://github.com/sandberg-lab/dataprivacy (a.k.a. BAMboozle)
Note: these tools are probably indeed more robust, but we need to consider that it might be unwanted to just 'replace' reads or variants with ref. genome ones. If someone wanted to reanalyse e.g. calculus for human DNA, they may not realise they are lookin at 'fake' sequence.
This is sort of 'tampering' with the FASTQ file in a misleading sense. Therefore I would still rather have a NNN
replacement or entire removal (will need to check if the tools support this).
Done in 2.4.5!
Check Documentation
I have checked the following places for your error:
Description of the bug
hostremoval_input_fastq
fails on larger samples b/c it runs out of memorySteps to reproduce
Steps to reproduce the behaviour:
Command line:
See error:
Caused by: Process
hostremoval_input_fastq (BSH001.A0101.SG1)
terminated with an error exit status (1)Command executed:
samtools index BSH001.A0101.SG1_PE.mapped.bam extract_map_reads.py BSH001.A0101.SG1_PE.mapped.bam BSH001.A0101.SG1_R1_lanemerged.fq.gz -rev BSH001.A0101.SG1_R2_lanemerged.fq.gz -m remove -of BSH001.A0101.SG1_PE.mapped.hostremoved.fwd.fq.gz -or BSH001.A0101.SG1_PE.mapped.hostremove d.rev.fq.gz -p 1
Command exit status: 1
Command output:
Command error: Traceback (most recent call last): File "/home/irina_marie_velsko/.nextflow/assets/nf-core/eager/bin/extract_map_reads.py", line 270, in
File "/home/irina_marie_velsko/.nextflow/assets/nf-core/eager/bin/extract_map_reads.py", line 147, in parse_fq
File "/home/irina_marie_velsko/.nextflow/assets/nf-core/eager/bin/extract_map_reads.py", line 120, in get_fq_reads
File "/opt/conda/envs/nf-core-eager-2.3.5/lib/python3.7/site-packages/Bio/SeqIO/QualityIO.py", line 933, in FastqGeneralIterator
seq_string = handle_readline().rstrip()
File "/opt/conda/envs/nf-core-eager-2.3.5/lib/python3.7/site-packages/xopen/init.py", line 268, in readline
return self._file.readline(*args)
File "/opt/conda/envs/nf-core-eager-2.3.5/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError
Work dir: /mnt/archgen/microbiome_calculus/abpCapture/03-preprocessing/set1_set3/work/e4/663badafbd377d9291bdb211a98525
Tip: when you have fixed the problem you can continue the execution adding the option
-resume
to the run command line -[nf-core/eager] Pipeline completed with errors-$ -l h_rss=184320M,mem_free=184320M
$ -S /bin/bash -j y -o output.log -l h_vmem=180G,virtual_free=180G