linsalrob / fastq-pair

Match up paired end fastq files quickly and efficiently.
https://edwards.flinders.edu.au/sorting-and-paring-fastq-files/
MIT License
142 stars 32 forks source link

Error on very large fastq file #12

Closed essere closed 4 years ago

essere commented 4 years ago

Hello, I am analyzing a very large fastq file (~240 Gb for each pair).

I have this error when running fastq-pair.

"We cannot allocate the memory for a table size of -436581356. Please try a smaller value for -t"

Could you provide some solution for this?

Thanks Sam

linsalrob commented 4 years ago

what is the command you are using?

If you are setting -t on the command line, you might be setting it too large of a number! It is stored as a signed int and so if you set the value too high it may be misinterpreted.

essere commented 4 years ago

From your instructions, I get the number of lines by "wc -l".

wc -l Sample.R1.fastq 2548641872

By dividing by 4, I got this number: 637160468 So I tried a command

fastq_pair -t 637160468 Sample.R1.fastq Sample.R2.fastq

Is it ok to use a smaller number than actual reads?

linsalrob commented 4 years ago

So that should have been OK (depending on the specific system you used), but yes, I suspect you have an integer overflow

Yes, it is OK to use a smaller number than the actual number of reads. Try dividing the number by 4 (159290117). In this case you will end up with, on average, four sequences per bucket.

essere commented 4 years ago

Thank you. It worked !

This is the very program I have been looking for. You are so great !!

linsalrob commented 4 years ago

Excellent!