Open aksarkar opened 8 months ago
@aksarkar I'm going to hold off on merging your PR for now. I think there is the potential to greatly improve the memory usage, but I have to play around with the implementation, and make sure that it doesn't also slow code the multithreading.
The timing of my test case indicates that it slightly improves the wallclock time also. In any case, it does not make appear to make the running time worse.
This change reduces memory usage by roughly 10%.
Profiling via
mprof
using the following snippeton an AWS
c7i.8xlarge
instance yields the following usages.On 295d7323
On g19ec112 :