Closed lwaldron closed 8 months ago
Additionally:
> print(object.size(propagated), units = "Gb")
2.9 Gb
and as soon as full_dump_with_0
is created:
> print(object.size(full_dump_with_0), units = "Gb")
2.7 Gb
so we need to do some cleanup of large objects sitting in memory.
is a memory-intensive calculation that has already been done for full_dump_with_0
Multiple changes made for memory efficiency in https://github.com/waldronlab/bugphyzzExports/pull/26/commits. I will go ahead and merge so that it will be tested in GHA.
Keep an eye on https://github.com/waldronlab/bugphyzzExports/actions/runs/6076545052 but this should finish within the GHA time limit. Note the outputs of system.time() like this:
user system elapsed
113.827 0.104 113.949
show that very little of the time spent in propagation is system time (CPU), in this case 0.104s out of 113.949s elapsed time. I assume that most of the time is spent doing NCBI lookups or something, and that if you could eliminate that bottleneck, propagation would take a fraction of a second per attribute. It is still feasible though, and priority is on implementing a "real" ASR method that provides probabilities or confidence intervals.
Made it almost to the end and dies with error code 137 (memory) while writing to disk. A little more cleanup should probably be enough.
https://github.com/waldronlab/bugphyzzExports/actions/runs/6076545052/job/16484731212
Just putting pryr::mem_change
statements around each line in the loop where error 137 is still occurring, I see that the following line is the one that requires a lot of memory:
So I am trying it without in c0f1166
This particular chain containing only
dplyr::mutate
calls seems to spike memory usage temporarily up to over 30GB (although this might be much less on a lower-memory machine, this is just what I observed intop
). Need to make the whole script more memory-efficient to work on GHA.https://github.com/waldronlab/bugphyzzExports/blob/a9fc18914cb3b1d9ea3a3d1c0121ccac5c8d482a/inst/scripts/export_bugphyzz.R#L281