zhafen / linefinder

A tool for finding and classifying the worldlines of Lagrangian parcels of mass, in the context of hydrodynamic simulations of galaxy formation.
https://zhafen.github.io/linefinder
MIT License
0 stars 1 forks source link

Heavy memory usage in `IDSelector.select_ids()` #5

Closed zhafen closed 7 years ago

zhafen commented 7 years ago

Originally reported by Zachary Hafen (Bitbucket: zhafen, GitHub: zhafen)


Running IDSelector.select_ids() starting at snapshot 0 down to snapshot 600 crashes with a memory error at snapshot ~120, even on a large memory node.

IDSelector.select_ids() should, at any given time, only use up memory equivalent to at most two snapshots: One for the snapshot the IDs are being selected in and one for the running set of IDs.

Update: Multiprocessing didn't seem to solve the issue. Still getting a memory error for loading a single snapshot plus a small set of arrays. Trying to update readsnap now.


zhafen commented 7 years ago

Original comment by Zachary Hafen (Bitbucket: zhafen, GitHub: zhafen)


Fixed as of this commit.

This was resolved by locating the objects that lingered in memory, explicitly deleting them with del, and then running gc.collect().

zhafen commented 7 years ago

Original comment by Zachary Hafen (Bitbucket: zhafen, GitHub: zhafen)


Relevant link on why Python doesn't release memory when a large object is deleted.

zhafen commented 7 years ago

Original comment by Zachary Hafen (Bitbucket: zhafen, GitHub: zhafen)


This may be solved by opening the h5py file using a context manager.

zhafen commented 7 years ago

Original comment by Zachary Hafen (Bitbucket: zhafen, GitHub: zhafen)


If multiprocessing doesn't work I can also try to profile the memory with the mprof executable, provided by memory_profiler.

zhafen commented 7 years ago

Original comment by Zachary Hafen (Bitbucket: zhafen, GitHub: zhafen)


It's possible this memory usage will persist until the process terminates, even if everything is garbage collected properly (as explained here). A potential workaround, which would also speed up the code, is to use Python multiprocessing. I'll try this out, and see what happens!

zhafen commented 7 years ago

Original comment by Zachary Hafen (Bitbucket: zhafen, GitHub: zhafen)


I'm using memory_profiler to investigate memory usage.

It looks like my instance of SnapshotIDSelector isn't being deleted.

zhafen commented 7 years ago

Original comment by Zachary Hafen (Bitbucket: zhafen, GitHub: zhafen)


Working on this.