Open gygabyte017 opened 3 years ago
Hi @gygabyte017, thanks for the report. I'm not sure if this is possible for you, but it would be helpful to see if any logging is collected (but not displayed) before it hangs.
Are you able to reproduce the issue from a python repl? If so, the instructions in this issue might yield some extra info that would be helpful (https://github.com/plotly/Kaleido/issues/36#issuecomment-756676527).
If possible, what would be most helpful would be a reproducible example consisting of:
Thanks!
Hi, unfortunately it is hard for me to give you what you asked, sorry about that :( because since it doesn't happen on my local pc while testing and it only happens on serverless containers spawned on EKS, andthe plotting happens after a lot of complex calculations involving other resources. However here's what I found out, hoping that might be somehow useful:
write_image
is called dozens of times to plot every needed plot, and the freeze never happens at the first image, always after a fews.scope._shutdown_kaleido()
it almost never freezes anymore, even though I'm using v0.2.1 (it still randomly happens on 1-2% of executions, while before it happened half of the times, so that's a good result).(Not sure about how could I access the frozen container and send an interrupt and interact with repl to provide more info).
Thanks
Thanks for this info @gygabyte017, that's helpful. Marking as a bug.
Same happening for the to_image
call.
Is there something can be done to avoid this ?
@Bhanuchander210, are you seeing this behavior being related to low memory as well?
@jonmmease
I am not sure but i hope so.
It is happening only on production environment randomly (as production env has other servers too.. so that i didn't track)
After the change made scope._shutdown_kaleido()
(as u suggested)... till now it not hanged...
Ok, thanks @Bhanuchander210.
Notes:
Cross reference https://github.com/plotly/Kaleido/pull/43, which added some internal tracking of JavaScript heap usage, periodically clearing memory by refreshing the active page. If manually running scope._shutdown_kaleido()
works around the issue, then I assume this internal page refresh to clear memory would do the same.
We're already refreshing the page when the heap reaches 50% of the maximum allowed. But I don't know whether this maximum limit (as returned by window.performance.memory.jsHeapSizeLimit
) takes into account the available system memory. If not, then this might explain the trouble we're running into in memory constrained environments. Two ideas (not mutually exclusive):
Hi, I get I get the same issue on my local pc on the first plotly figure i want to statically export (to a PDF), when I limit the python process's virtual memory "RLIMIT_AS" manually to for example 6GB.
Is there any progress on that topic?
Here is the code that I use to limit the virtual memory. (Thats something we need to do for that specific program to make sure that i wont get in conflict with the productive processes...)
import resource
def limit_memory():
max_memory_mb = 6000
soft_limit = max_memory_mb * 1024 * 1024
resource.setrlimit(resource.RLIMIT_AS, (soft_limit, resource.RLIM_INFINITY))
Since it is the first export anyways, I cannot use the proposed workaround with scope._shutdown_kaleido()
...
Im running on
kaleido==0.2.1
plotly==5.3.1
Hi @gygabyte017
Did you ever resolve this issue? Did downgrading to v0.1.0 work to solve this issue?
Thanks.
Hi @MaartenBW, unfortunately I didn't, any version seems random, I don't believe there are reasons to prefeer 0.1.0 over 0.2.1 or whatever, it's just luck depending on the machine resources.
I managed to develop a ugly workaround, that is: 1) Increase the maximum ram on the containers, even though it wouldn't be necessary, and 2) the write_image is executed in a separate process with a timeout, if after i.e. 30 seconds it is still working, I kill the separate process and try again up to 5 tries.
In this way it's very rare that all the 5 tries fails, however it may still happen.
Now I want to try the solution described here, maybe it can work in a stable way? https://github.com/plotly/Kaleido/issues/110#issuecomment-2113369864
@gygabyte017 Wow, thanks for your fast reply.
Hi, am experiencing kaleido randomly freezing on our production environment (unix with kubernetes).
I noticed that when the container has low memory, perhaps because the main python program consumed a lot of resources, for instance for holding the dataframe data needed to be plotted, when it calls
write_image
it will hang forever.The kaleido process never terminates, there are no errors about a low memory condition, it just sits there with zero cpu consumption forever.
How can this be improved?
This behavior is very frustrating because sometimes I just find containers stuck running forever, that if I manually relaunch with the very same conditions they may run correctly, so I have no way to monitor if they got stuck,.
It would be ok that kaleido returns a memory error or a process failed exception, then it could be handled. But freezing forever... is just bad.
Any advice? Thank you