uw-ipd / RoseTTAFold2NA

RoseTTAFold2 protein/nucleic acid complex prediction
MIT License
322 stars 72 forks source link

../input_prep/reprocess_rnac.pl id_mapping.tsv.gz rfam_annotations.tsv.gz # ~8 minutes #92

Open rse-lbl opened 6 months ago

rse-lbl commented 6 months ago

I'm trying to install on my laptop to get some experience before installing and running on supercomputer GPU nodes.

i7-3520M with 16 GB RAM and >100 GB swap space on an HDD (per https://github.com/uw-ipd/RoseTTAFold2NA/issues/43). With the install directory on a separate external HDD (downloaded files that the script is running on are on this same HDD).

So far I'm at 12 hours running this script. Does anyone know how much longer I should expect to wait? Or will this never resolve?

Additionally, what files should I expect to see created, and what is their sizes?

Can I unpack the tar.gz files and run the script on the re-gz'd components one at a time to save memory use and speed things up? Go further and split the files then concatenate the split output files?

rse-lbl commented 6 months ago

It's now been a little over 14 hours. I'm tracking the file size change between the updating rfam_annotations.tsv.gz and rfam_annotations.tsv.gz.bak. Based on the average speed in the file size delta it's looking like another 6 hours for the sizes to become the same.

So about 20 hours total.

free -h is showing about 20 Gi of swap used in addition to 14 Gi of RAM (a little of which is running other processes).

I don't know Perl, else I'd rewrite the script to split, batch and concatenate the outputs if free mem is too low to do it all in RAM.

rse-lbl commented 6 months ago

After a total of about 19 and a half hours the rfam_annotations.tsv.gz has stopped updating at a size of 384,827,392 bytes compared to the 347,475,915 bytes of the backed up original file. The memory still hasn't cleared out and my display manager (and the Xterm within it that I launched the original script from) are dragging. Top in tty3 shows that reprocess_rnac.pl is still running and using 89.8% of memory (VIRT says 26.3g, RES 13.9g).

35 minutes later still no further update, and the drives are silent, so I'm killing the script and continuing with installation.

SuhasSrinivasan commented 6 months ago

I had encountered an Out of memory issue similar to #43 and found a workaround. Documented here: https://github.com/uw-ipd/RoseTTAFold2NA/issues/96 Hope this is helpful.