mspass-team / mspass

Massive Parallel Analysis System for Seismologists
https://mspass.org
BSD 3-Clause "New" or "Revised" License
30 stars 12 forks source link

jupyter memory hogging #488

Open pavlis opened 8 months ago

pavlis commented 8 months ago

I noticed that while debugging a workflow in jupyter notebook that jupyter has a tendency to hog memory. In particular, it does something I find counterintuitive. That is, if you close a tab for a notebook you open in a session it doesn't actually do anything other than remove the display. I can see why they do that. For instance, I found it helpful to reenter a notebook that had interactive zoom-pan enabled. I was more than a bit surprised when I reopened the notebook and found the interactive loop was running. That happened to be useful for me at the time, but is a graphic example (a bad, unintentional pun) demonstrating closing a tab only really keeps the notebook active until the jupyter server shuts down.

That problem is likely not important for large, parallel workflows where the notebook is, or should be, normally run in a batch mode. Where it enters is on desktop workflow development. It came up with me because I'm developing on a system with fairly modest memory by modern standards and it too easy to push it to the limit.

The solution to this issue is clearly documentation. A simple we search uncovered a number of sources. I found this one particularly helpful. The other thing is think I will do some modifications to the new "memory management" section of the user manual in the pending documentation update branch. When that is merged we can mark this issue closed.

pavlis commented 8 months ago

@wangyinz I wonder if some of the issues we've had with confusing memory management behavior with dask is related to this issue? jupyter definitely does not like to release memory until you exit. I have a suspicion this problem will not be as much of a problem running large jobs on a cluster since there the jupyter server is in a separate container running on the master node. It is when one runs in the "all-on-one" mode that jupyter is running in the same container as everything else. I have reason to suspect it may the cause of the accumulating "unmanaged memory" we've seen running dask on desktop systems.

wangyinz commented 8 months ago

hmmm.... I am not sure. Even if jupyter keeps the session alive, the garbage collection mechanism of python should still work. It makes no sense to override that default behavior of python.