Closed hansong798 closed 1 year ago
Hi, in the pipeline files fix_wipe_pairs_reads_parallel.smk
and fix_wipe_single_reads_parallel.smk
there is a hardcoded argument that we provide to the checkpoint split_fastq
that is the size (row number) of the fastq chunks that we produce to accelerate computation.
Erroneously, it was set to 2000 (our tests). For this reason, you have got so many chunks and, probably, Snakemake died.
If you have installed FastqWiper+workflow manually, just change this number to somesthing about 500000000 e.g.,
split -l 500000000 --numeric-suffixes {input} data/{wildcards.sample}_chunks/chunk --additional-suffix=.fastq
If you used the Docker image. Just pull the new image just uploaded to DockerHub and give it another try.
I cannot predict the computing time, because it depends on your hardware. But, with the current implementation of FastqWiper, it could also take more than 1 day to complete. We will improve it in the future.
Please, let us know.
We have updated the package adding the possibility to specify the chunk size (number of rows from the original FASTQ to be cleaned) stright from the command line. Using a multi-core machine, this may sensibly speed up the computation.
Just pull again the Docker image and read the documentation.
Thank you for wonderful tool to process fastq file and I hope your tool would be a solution for my prblem.
I have run fastqwiper at last night and in this morning, I found it is still going. I run paired fastqfile (R1 file - 29GB, R2 file - 32GB).
Command line said it just finished only 0.1% of job. I think this takes too long time to finish. So, is this normal to process data and how long should I usually expect to wait?
Is there other way to shorten the time?