Closed rgrzeszi closed 3 years ago
Thanks for the report and the deadlock PR fix in #50.
Regarding the process build up, can you tell whether:
The memory leak fix in https://github.com/plotly/Kaleido/pull/43 involves periodically reloading the headless Chromium tab that kaleido uses, and if you're seeing (1) above, it would be helpful to know if this makes any difference for you.
You can install the alpha build of kaleido that has this fix with:
https://github.com/plotly/Kaleido/releases/download/v0.1.0a2/kaleido-0.1.0a2-py2.py3-none-manylinux1_x86_64.whl
If (2), do you know if the Python process that's driving kaleido is always exiting cleanly (without crashing)? The chromium process should be shut down when Python exits and calls the __del__
method on the base scope, but something might be going on that's causing this to not get called.
Thanks!
Hello Jon,
it's (1) a single python instance which runs a data analysis and generates quite a bunch of plots.
A method in a plotting class is called multiple times, in which the Kaleido Scope is created as shown above. With every write a new instance is spawned but it seems at least some of them do not terminate correctly. In my understanding the scope should be created within the method and when leaving the method del would implicitly be called which should then call the _kaleido_shutdown and would avoid the deadlock issue. However, it seemed that this is note the case. I would have to run more experiments on this.
I cannot pinpoint it to a single call, but it seems that simpler plots may not cause this issue (i.e. a simple pie plot). I do visualize more complex things like heatmaps on larger background images (3-4 Megapixel). I assume that the process does not terminate correctly in these cases.
A method in a plotting class is called multiple times, in which the Kaleido Scope is created as shown above. With every write a new instance is spawned but it seems at least some of them do not terminate correctly. In my understanding the scope should be created within the method and when leaving the method del would implicitly be called which should then call the _kaleido_shutdown and would avoid the deadlock issue. However, it seemed that this is note the case. I would have to run more experiments on this.
Ok, this does actually make sense. The __del__
method isn't guaranteed to be called when the method exits (https://docs.python.org/3/reference/datamodel.html?highlight=__del__#object.__del__). So it's not too surprising that the chromium subprocesses build up with this workflow. It's possible that that thread watching standard error is preventing the reference count of the scope from dropping to zero, but that would just be a guess.
The workflow that the Kaleido scope is designed for, to this point, is to reuse a single scope repeatedly so that the chromium startup time is only required the first time. Is this architecture possible for you?
The alternative is to make sure that the chromium subprocess shuts down when you are finished exporting images with the scope. We should probably create a public shutdown
method and document that this should be called to guarantee that the chromium subprocess is shut down, and we could also make the kaleido scope closable so that you could use it in a context manager like this:
with PlotlyScope(...) as scope:
# Chromium subprocess launched
scope.transform()
# Chromium subprocess shut down
I believe I tried creating a single scope and it had the same issue, but I will confirm this.
You were absolutely right, the del has not been called and the subprocesses did build up due to this fact. Strangely enough this does not happen on all machines. I had to do some rewriting to really create a single scope but that seems to solve the issue and thus also avoids the infinite loop in the error handler (as I no longer call _shutdown_kaleido manually) - thanks!.
Thanks for reporting back @rgrzeszi. Glad it's working for you now! I'll still get your PR in, and consider where to document this potential pitfall.
Alright to close this @rgrzeszi?
Hi guys,
we are running plotly and kaleido and we are generating a large number of plots (usually rendered as svg or png) on potentially large images. I observed quite a large number of processes which are not being stopped properly (see below) up until the point that no more processes can be forked and the whole program crashes.
Following the workaround here: https://github.com/plotly/Kaleido/issues/42
I implemented a call which forcefully shuts down kaleido:
This partially solved the issue at hand. However I can now observe the following behavior. Depending on the time when I shutdown kaleido I run into a deadlock situation with the collection of the standard error sooner or later:
My current workaround is to break the condition if the process is None.
Any help / feedback would be appreciated.