smithlabcode / falco

A C++ drop-in replacement of FastQC to assess the quality of sequence read data
https://falco.readthedocs.io
GNU General Public License v3.0
96 stars 10 forks source link

memory issue? #39

Closed lucacozzuto closed 2 years ago

lucacozzuto commented 2 years ago

Dear developers, thanks for your valuable tool! I'm trying to use it for some nanopore data and I got the following error:

[Thu Sep  8 14:45:34 2022] creating directory for output: KO_fastqc
[limitst]   using file /falco-1.1.0/Configuration/limits.txt
[adapters]  using file /falco-1.1.0/Configuration/adapter_list.txt
[contaminants]  using file /falco-1.1.0/Configuration/contaminant_list.txt
[Thu Sep  8 14:45:34 2022] Started reading file KO.fq.gz
[Thu Sep  8 14:45:34 2022] reading file as gzipped FASTQ format
[running falco|=                                                  |  2%]/ 2: 19870 Killed                  falco -o KO_fastqc -t 1 KO.fq.gz

I used 80Gb of RAM so I don't think I have a problem with RAM.

Luca

andrewdavidsmith commented 2 years ago

@lucacozzuto can you provide some additional information, for example a small piece of the input file? And also the command -- you've snipped the verbose output and progress but we don't see the arguments or anything like filenames. If you feel there would be confidential info in the filenames, then it would help if you could copy them to generic filenames and post the exact command line you used. Thanks!!!

lucacozzuto commented 2 years ago

Many thanks for your quick answer! This is the command line

falco -o KO_fastqc -t 1  KO.fq.gz

The file is huge (59G) and there are some reads that are up to 1 Mb

guilhermesena1 commented 2 years ago

hello,

thank you for reaching out about the issue. I was able to replicate the problem with synthetic very large reads.

This seems not as much a memory issue as it is a bug in falco where we weren't accounting for the maximum read length to be as large as the ones currently produced by oxford nanopore.

If you are working with a clone of the repo, I pushed a fix at 2f82110 that may resolve the issue. On my 16 GB RAM machine I was able to run falco on a simulated read of size 30 million to completion.

If at all possible, could you let us know if you can run falco to completion on your data with this commit?

Thank you very much in advance!

lucacozzuto commented 2 years ago

Dear @guilhermesena1, it worked! Thanks for this fix, I managed to add it to my nextflow pipeline for replacing fastQC. I made a Docker file with your tool, so in case you want I can add it to your repo.

Best,

Luca