torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
656 stars 122 forks source link

derep_fulllength: empty output fasta file #476

Closed puekse closed 2 years ago

puekse commented 2 years ago

Dear all,

I get the empty fasta output file after the dereplication step, though my input fasta file contains around 5M sequences.

I run the following command: vsearch derep_fulllength filtered.fasta --output derep.fasta and get the message "Dereplicating file filtered.fasta 28%Killed". And as I mentioned above, output file derep.fasta is empty.

Could you please help me to solve this problem?

Thank you very much!

torognes commented 2 years ago

Thanks for reporting this problem. The error message indicates that the program was terminated after 28% of the input sequences were read into memory. The program therefore did not come as far as to start replicating the sequences.

It is difficult to say precisely why this happened without more information, but one explanation could be that it ran out of memory.

How long are your sequences on average? How much memory do you have available? The first line of output from vsearch -v includes information on the amount of memory available.

puekse commented 2 years ago

Dear Torbjørn,

Thank you for your answer.

I have this vsearch version: "v2.14.1_linux_x86_64, 0.9GB RAM, 1 cores". The reads are short, obtained by MiSeq 2x250 for 200 environmental samples. The input fasta file is 1.55 GB.

Please let me know if you need more information.

frederic-mahe commented 2 years ago

hi @puekse it seems your system has less than a gigabyte of memory in total (are you running on a virtual machine?). Once the operating system is running, that leaves little room for vsearch to work.

When vsearch reserves all of the available memory for the dereplication, a linux process called the Out Of Memory Killer or OOM Killer is triggered, and vsearch is killed to prevent the system from crashing.

puekse commented 2 years ago

Thank you very much for your help. Indeed, I am using virtual machine and that could be the problem.

frederic-mahe commented 2 years ago

Then you need to allow more memory to your virtual machine.

Given the size of your dataset, 2 GB of memory might be just enough to accommodate both the operating system and the dereplication.

torognes commented 2 years ago

See also issue #475.

puekse commented 2 years ago

Thank you very much for all your help! It works well now.