How long usually it takes?

hansong798 commented 1 year ago

Thank you for wonderful tool to process fastq file and I hope your tool would be a solution for my prblem.

I have run fastqwiper at last night and in this morning, I found it is still going. I run paired fastqfile (R1 file - 29GB, R2 file - 32GB).

Command line said it just finished only 0.1% of job. I think this takes too long time to finish. So, is this normal to process data and how long should I usually expect to wait?

Is there other way to shorten the time?

mazzalab commented 1 year ago

Hi, in the pipeline files fix_wipe_pairs_reads_parallel.smk and fix_wipe_single_reads_parallel.smk there is a hardcoded argument that we provide to the checkpoint split_fastq that is the size (row number) of the fastq chunks that we produce to accelerate computation. Erroneously, it was set to 2000 (our tests). For this reason, you have got so many chunks and, probably, Snakemake died.

If you have installed FastqWiper+workflow manually, just change this number to somesthing about 500000000 e.g.,

split -l 500000000 --numeric-suffixes {input} data/{wildcards.sample}_chunks/chunk --additional-suffix=.fastq

If you used the Docker image. Just pull the new image just uploaded to DockerHub and give it another try.

I cannot predict the computing time, because it depends on your hardware. But, with the current implementation of FastqWiper, it could also take more than 1 day to complete. We will improve it in the future.

Please, let us know.

mazzalab commented 1 year ago

We have updated the package adding the possibility to specify the chunk size (number of rows from the original FASTQ to be cleaned) stright from the command line. Using a multi-core machine, this may sensibly speed up the computation.

Just pull again the Docker image and read the documentation.

mazzalab / fastqwiper

How long usually it takes? #6