rrwick / Porechop

adapter trimmer for Oxford Nanopore reads
GNU General Public License v3.0
335 stars 125 forks source link

Overcoming limitations on the number of reads that can be processed #18

Open ishaanpbs opened 7 years ago

ishaanpbs commented 7 years ago

Hi There seems to be a limitation in the number of reads that can be processed by porechop at a time, is there a way to remove this limit.

Thanks

barneypotter24 commented 7 years ago

Hello, I would like to add that we are having a similar issue, resulting in Porechop failing on a library of ~5.5 million reads. Thanks

rrwick commented 7 years ago

I suspect this is a limitation of Porechop's somewhat simplistic design: it currently loads all reads into memory and then processes them. So if you have more reads than you can fit into memory, I'd expect it to crash. Does this fit with what you're seeing?

The workaround would be to split your input reads into multiple files and run Porechop on each - not very elegant but probably the only solution at the moment.

The real fix would be a redesign of how Porechop works. It could load reads as it handles them and then free up the memory when it's done with each read. This would obviously be better, but it would be a lot of work :smile: So I'll leave this issue open as an enhancement.