Closed andhuang-CLGX closed 3 months ago
Are you actually seeing a leak or just getting that warning?
Maybe this https://github.com/bokeh/bokeh/issues/11477 is relevant.
It may be, but there's probably also something in Panel holding onto a reference to something. Just not sure if that's a real issue or if that eventually gets cleaned up after bokeh warns about it.
As I mentioned in #2302 I can recreate this problem. I don't know if this is the cause of the problem but I can also get a memory leak message with the following code:
import panel as pn
import numba
@numba.njit
def test():
n = 0
for i in range(10000):
n += i
return n
pn.panel(f"# Test {test()}").servable()
Which gives the following output, note that it has more information than "normally":
2021-08-18 15:57:28,395 Module <module 'bokeh_app_72fae5a1c97b49b39718157bdc4d25a3' from 'C:\\Users\\WDAGUtilityAccount\\Desktop\\tmp.py'> has extra unexpected referrers! This could indicate a serious memory leak. Extra referrers: [{'func': <function test at 0x0000022D303B55E0>, 'func_qualname': 'test', 'func_name': 'test', 'code': <code object test at 0x0000022D2DD86920, file "C:\Users\WDAGUtilityAccount\Desktop\tmp.py", line 4>, 'module': <module 'bokeh_app_72fae5a1c97b49b39718157bdc4d25a3' from 'C:\\Users\\WDAGUtilityAccount\\Desktop\\tmp.py'>, 'modname': 'bokeh_app_72fae5a1c97b49b39718157bdc4d25a3', 'is_generator': False, 'pysig': <Signature ()>, 'filename': 'C:\\Users\\WDAGUtilityAccount\\Desktop\\tmp.py', 'firstlineno': 4, 'arg_count': 0, 'arg_names': [], 'unique_name': 'test$7'}]
\ Environment information:
It may be, but there's probably also something in Panel holding onto a reference to something. Just not sure if that's a real issue or if that eventually gets cleaned up after bokeh warns about it.
Yes at some point, with enough refreshes, my server runs out of memory, even with --keep-alive 0 --check-unused-sessions 10000 --unused-session-lifetime 120000
set. The sessions seem to persist forever
Not sure if this is relevant
tornado.iostream.StreamClosedError: Stream is closed
During handling of the above exception, another exception occurred
Traceback (most recent call last):
File "/home/user/miniconda3/envs/env/lib/python3.7/site-packages/tornado/websocket.py", line 1104, in wrapper
raise WebSocketClosedError()
I also tried installing https://github.com/bokeh/bokeh/pull/11482 But still encounter memory leaks
and also encounter logs not showing up upon importing datashader
@Hoxbro Your example isn't hugely surprising since numba caches the compiled function, but I'll have to see if it's actually leaking memory.
Here's mprof with the newest bokeh tag 2.4 (and all sessions closed)
Before I open the app on a browser:
2021-08-18 16:09:17,218 [pid 13502] Memory usage: 91.00 MB (RSS), 499.00 MB (VMS)
2021-08-18 16:09:17,247 uncollected Documents: 1
2021-08-18 16:09:17,274 uncollected Sessions: 0
2021-08-18 16:09:17,308 uncollected Models: 1
(After I open the app, the logging stops)
@andhuang-CLGX Can you provide the application you are profiling with?
Unfortunately not for this, but I'll see if this happens on any of my personal apps
Just tried running with Bokeh master (note that requires some changes for compatibility), in a standard application I can't reproduce those extra referrers issues. Here's a pretty heavy session where I opened and closed about 50 sessions over the course of a few minutes. Memory seems to be reclaimed eventually:
So I really think we need to profile specific applications that are showing issues. One example is the case where numba jit functions are compiled inline, I can see how Numba's caching might interfere with garbage collection but I'm guessing that's not the case you're encountering @andhuang-CLGX.
To offer more context, I am using an app with datashader + a lot of callbacks + a database
But I'll try to find a MCVE on my personal time
Sounds like numba is a commonality between these memory leaks.
@andhuang-CLGX see #2302 for the why the logging is missing after importing datashader.
@andhuang-CLGX see #2302 for the why the logging is missing after importing datashader.
Right I saw it, but I don't see a solution other than not importing datashader if I am not mistaken, but I need datashader loaded for my app to run
If it's the same issue, presumably the logging level can be reset with a manual call regardless of what any other library does to it.
I also tried installing bokeh/bokeh#11482 But still encounter memory leaks
@andhuang-CLGX There is a lot of low level things to clean up, that PR was just Part 1 of several (as the title implied). Since you seem capable to test out things from source, here is the latest installment, that is about 95% done, at least from the perspective of anything reproduced by the OP in bokeh/bokeh#11477
Latest PR: https://github.com/bokeh/bokeh/pull/11523
It would certainly be helpful to know if it helps the situation here as well, though as @philippjfr notes, the problem here may lie on the Panel side (e.g. holding references too aggressively)
thanks for your hard work! I really appreciate your quick updates; I will try it out tomorrow and let you know.
@bryevdv Maybe I wasn't actually using the new version before, but now I can't run my panel app:
self._main_handler.modify_document(doc)
File "/home/user/miniconda3/envs/env/lib/python3.7/site-packages/panel/io/server.py", line 292, in modify_document
doc._modules.append(module)
AttributeError: 'Document' object has no attribute '_modules'
>>> import bokeh
>>> bokeh.__version__
'2.4.0dev4+6.g4ad1802'
cc @philippjfr it looks like there is definitely a change Panel will need to adapt to since Document._modules
is going away. Though FWIW this Panel breakage is somehow not showing up in the downstream tests (which are all passing in CI).
@andhuang-CLGX That change to private API was in one of the PRs but I don't remember if it was the first or a later one.
Actually, I just needed to clone panel master. Let me test again
>>> import bokeh
>>> bokeh.__version__
'2.4.0dev4+6.g4ad1802'
>>> bokeh.__version__
KeyboardInterrupt
>>> import panel
panel.>>> panel.__version__
'0.12.1.post4+g7d07a3b'
Okay this is how it looks when I use the app for a while and then near the end do mass refreshes
These are the settings
--mem-log-frequency 1000 --keep-alive 0 --check-unused-sessions 10000 --unused-session-lifetime 120000 --log-level debug
Planning to do mass refreshes, then close the tabs, and let it sit and see if unused session lifetime will clean it up
cc @philippjfr it looks like there is definitely a change Panel will need to adapt to since Document._modules is going away. Though FWIW this Panel breakage is somehow not showing up in the downstream tests (which are all passing in CI).
Mostly fixed this already, not quite sure why this isn't showing up in tests though...Actually I have one thought, since the current dev release still pins Bokeh<2.4 I suspect we are actually downgrading it in the bokeh downstreams test suite. Will release a new dev release today which will unpin it.
Okay the memory still seems to persist long after 2 minutes of closing the tab. I think I expect it to follow the blue line after 2 minutes
I think 400 mb is the baseline. I will let it run over the weekend and see what the plot looks like.
This is a separate app that was running for multiple days not using datashader and using the stable branches of bokeh + panel
@andhuang-CLGX I am currently not convinced that these plots are entirely accurate in a useful sense. See. e.g
https://distributed.dask.org/en/latest/worker.html#memory-not-released-back-to-the-os
In many cases, high unmanaged memory usage or “memory leak” warnings on workers can be misleading: a worker may not actually be using its memory for anything, but simply hasn’t returned that unused memory back to the operating system, and is hoarding it just in case it needs the memory capacity again. This is not a bug in your code, nor in Dask — it’s actually normal behavior for all processes on Linux and MacOS, and is a consequence of how the low-level memory allocator works (see below for details).
In the work I have done so far I can confirm that after a session cleanup there are not any Bokeh Sessions, Documents or document modules, Bokeh models (modulo one cycle I am still finishing cleaning up), or excess DataFrames anwhere in the Python runtime. This is absolutely certain from direct inspection of gc.get_objects()
— those types of objects were present in gc.objects()
before cleanup, and are 100% gone afterwards.
Yet, reported RSS does not really shrink after cleanup. Except until it does. If you look in the "part 5" PR you can see that I contrived an example to add 1GB of memory every session. If there is actually a leak then I would expect to eventually OOM in short order. But what happens, opening one session after other is that memory is eventually reclaimed according to RSS reported, but only after ~2GB is exceeded total. This pattern of growing and shrinking repeats indefinitely, and there is never any OOM. It seems undeniable to me that this reported number is being modulated by something at a lower level than we cannot control.
What I would actually want to see demonstrated at this point is a real, actual OOM. That would definitively prove that memory is being leaked. But I would be surprised if that is possible with pure Bokeh. It might possible that there are leaks in Panel, but that will require it's own investigation.
I have had this "extra unexpected referrers" warning for months with my panel app that uses datashader. Over the various combinations of versions of bokeh/panel/datashader I have used, I cannot remember a time when this issue did not occur at some point. However I did not bother to report it as I thought it might be a false positive or, if not, at least it did not create noticeable memory problems from a user perspective in my case.
If there's indeed a real memory leak involved, I'm looking forward to finding a way to prevent that. It's not clear to me if the root cause may in part lie in the application code itself or not though.
I have had this "extra unexpected referrers" warning for months with my panel app that uses datashader.
@TheoMathurin that's almost certainly a usage problem (in Panel or your code, I can't say). That message indicates something outside of Bokeh has grabbed a reference to the module that services a session Document.
I have had this "extra unexpected referrers" warning for months with my panel app that uses datashader. Over the various combinations of versions of bokeh/panel/datashader I have used, I cannot remember a time when this issue did not occur at some point. However I did not bother to report it as I thought it might be a false positive or, if not, at least it did not create noticeable memory problems from a user perspective in my case.
If there's indeed a real memory leak involved, I'm looking forward to finding a way to prevent that. It's not clear to me if the root cause may in part lie in the application code itself or not though.
Same here, I am also using datashader, could be the cause?
So my main file after a few days uses 14 GBs (maybe from constant metric checks visiting the page). So I was testing on a very barebone test and I noticed on close, I see:
WebSocket connection closed: code=1001, reason=None
user log out
import panel as pn
pn.extension()
def logout(e):
print ('user log out ')
cb2 = pn.state.on_session_destroyed(logout)
pn.Row("hey").servable()
But not for my primary application. However, it may just be the logs are suppressed so I was wondering how can I reset the logging with a manual call? "If it's the same issue, presumably the logging level can be reset with a manual call regardless of what any other library does to it."
Okay even if I import datashader in my test application, I don't see bokeh logs anymore, but I still see:
user log out
user log out
user log out
user log out
user log out
user log out
user log out
import panel as pn
import datashader
pn.extension()
def logout(e):
print ('user log out ')
cb2 = pn.state.on_session_destroyed(logout)
pn.Row("hey").servable()
I do not see the same for my main application so the sessions aren't closed I think...
Instead, after I closed all my sessions, rather than seeing user log out
, I get the message:
bokeh.document.modules - ERROR - Module <module 'bokeh_app_49dc7e18ca634be18a81a20ca5e32e1c' from 'app.py'> has extra unexpected referrers! This could indicate a serious memory leak. Extra referrers: [<cell at 0x7fcce2582450: module object at 0x7fcce2790ad0>]
Based on this, I don't think it's a false positive
After some tests on a minimal example, I also think that importing datashader is likely enabling this problem, namely the extra referrers error and the fact that logging stops.
In my case the error seems to be systematically issued just before the first session that has been opened is destroyed, whether or not you have other remaining active sessions. It seems to happen only once over the server lifetime.
So it seems it's not datashader specifically that's causing the extra referrers warning but rather it's HoloViews. That said after all the session cleanup hooks have run it doesn't seem to hold on to any of the data so that doesn't seem to be the source of the memory leak either. Still investigating.
Nevermind, that wasn't true, even just importing datashader causes the extra referrers error.
Seemingly have tracked down the culprit, if you add import numba.cuda
to your app you will get the extra referrer warning. Will ask the numba folks if they have an idea.
The culprit is here: https://github.com/numba/numba/blob/master/numba/cuda/cudadrv/driver.py#L224
The error on import gets held on the numba.cuda.cudadrv.driver.driver
singleton object and therefore keeps a reference to the module, which doesn't get cleaned up. Numba team is aware and will hopefully fix this asap.
Here's the fix in numba https://github.com/numba/numba/pull/7360
I'm not currently aware of any memory leaks outside of those caused by datashader
(and I'm unsure even about those). Please open a new if you're still encountering issues around this.
I think I am still experiencing this issue: https://discourse.holoviz.org/t/panel-holoviews-bokeh-app-memory-leaks-looking-for-general-best-practices/2379