ythuang0522 / homopolish

High-quality Nanopore-only genome polisher
GNU General Public License v3.0
65 stars 12 forks source link

Process killed at `Select closely-related genomes` stage on local machine #58

Closed ezherman closed 1 year ago

ezherman commented 1 year ago

When running homopolish v 0.4.1 on my local machine, the process gets killed at the stage Select closely-related genomes. This does not happen when running homopolish on my university's HPC cluster. Is there a way to prevent the process from being killed on my local machine? Please see the command and output below. Note that I get the same output with -t 1. I can re-post the assembly mentioned in #56 if needed.

homopolish polish -a input_barcode24.fasta -s data/homopolish/bacteria.msh -m R9.4.pkl -o results/intermediate/barcode24/homopolish_assembly -t 4

[2022/12/14 13:21] INFO: RUN-ID: contig_1
contig_1
/mnt/c/Users/elh605/assemble-cf-isolates/results/intermediate/barcode24/homopolish_assembly/debug
[2022/12/14 13:21] INFO: Stage: Select closely-related genomes
Killed
TIME Select closely-related genomes: 25 MINS 55 SECS.
num_of_homologous_genomes:, 0
This contig contig_1 closely-related genome is less than 5, not to polish...
TIME Total: 25 MINS 55 SECS.
ezherman commented 1 year ago

Would you have any thoughts on this? I would appreciate any help!

ythuang0522 commented 1 year ago

Hi @ezherman, can you add -d into the command? This will print all the intermediate files and we can see where the program stopped. May we know how much RAM on your local machine?

ezherman commented 1 year ago

Hi @ythuang0522, sure! I included the -d option. The debug folder contained the following files: contig_1.fasta and contig_1.sort.tab. The .sort.tab file was empty. Does this help you understand where the program stopped?

My laptop has 16GB of RAM.

Hereby the command and the stdout:

homopolish polish -d -a input_barcode02.fna -s resources/homopolish/bacteria.msh -m R9.4.pkl -o results/intermediate/barcode02/homopolish_assembly -t 4

[2023/01/13 08:50] INFO: RUN-ID: contig_1
contig_1
/mnt/c/Users/elh605/assemble-cf-isolates/results/intermediate/barcode02/homopolish_assembly/debug
[2023/01/13 08:50] INFO: Stage: Select closely-related genomes
Killed
TIME Select closely-related genomes: 7 MINS 6 SECS.
num_of_homologous_genomes:, 0
This contig contig_1 closely-related genome is less than 5, not to polish...
TIME Total: 7 MINS 6 SECS.
ezherman commented 1 year ago

This might also be helpful: I tried to reproduce my laptop's behaviour on my HPC cluster, by running homopolish with 4 cores and 16 GB of RAM. There was a bit more information as part of the Killed message. It seems that the process is killed at the mash dist step, see below:

sh: line 1: 161836 Killed                  mash dist -p 4 -d 0.050000000000000044 resources/homopolish/bacteria.msh /mnt/lustre/users/elh605/assemble-cf-isolates/results/intermediate/barcode02/homopolish_ass
embly/debug/contig_1/contig_1.fasta > /mnt/lustre/users/elh605/assemble-cf-isolates/results/intermediate/barcode02/homopolish_assembly/debug/contig_1/temp.tab
ESC[92m[2023/01/13 11:06] INFO: RUN-ID: contig_1
ESC[0mESC[92m[2023/01/13 11:06] INFO: Stage: Select closely-related genomes
ESC[0mESC[95mTIME Select closely-related genomes: 1 MINS 27 SECS.
ESC[0mESC[95mThis contig contig_1 closely-related genome is less than 5, not to polish...
ESC[0mESC[95mTIME Total: 1 MINS 27 SECS.
ESC[0mcontig_1
/mnt/lustre/users/elh605/assemble-cf-isolates/results/intermediate/barcode02/homopolish_assembly/debug
num_of_homologous_genomes:, 0

When I ran homopolish with more than 16GB of RAM, the cluster output showed that slightly more than 16GB of RAM is required (Memory Utilized: 16.03 GB). Would there be a way to attenuate the amount of RAM required by mash dist?

ezherman commented 1 year ago

Hi @ythuang0522, would you be able to provide any further thoughts on this? Thanks!

ythuang0522 commented 1 year ago

Hi @ezherman, the root cause and memory bottleneck is the expansion of larger bacterial genomes since v0.4, where 1.19M bac genomes are compressed in the Mash sketch (i.e., bacteria.msh). This requires at least 10Gb RAM to load the sketch plus other overhead of running Mash dist. We didn't estimate the min memory required but it looks like 16Gb RAM is definitely not enough. If you have only 16Gb memory, the alternative method is using a smaller bacterial skech instead (~1Gb). Our paper was based on the smaller one (~180k genomes) which at least works well on common species. Sorry we cannot optimize further as Mash is maintained by another group. I hope this helps.

ezherman commented 1 year ago

Yes this helps, thank you @ythuang0522! I will include an option in my assembly pipeline to use the smaller bacterial sketch, for users that are running out of memory.