plotly / Kaleido

Fast static image export for web-based visualization libraries with zero dependencies
Other
363 stars 36 forks source link

Hanging forever when low memory #103

Open gygabyte017 opened 3 years ago

gygabyte017 commented 3 years ago

Hi, am experiencing kaleido randomly freezing on our production environment (unix with kubernetes).

I noticed that when the container has low memory, perhaps because the main python program consumed a lot of resources, for instance for holding the dataframe data needed to be plotted, when it calls write_image it will hang forever.

The kaleido process never terminates, there are no errors about a low memory condition, it just sits there with zero cpu consumption forever.

How can this be improved?

This behavior is very frustrating because sometimes I just find containers stuck running forever, that if I manually relaunch with the very same conditions they may run correctly, so I have no way to monitor if they got stuck,.

It would be ok that kaleido returns a memory error or a process failed exception, then it could be handled. But freezing forever... is just bad.

Any advice? Thank you

jonmmease commented 3 years ago

Hi @gygabyte017, thanks for the report. I'm not sure if this is possible for you, but it would be helpful to see if any logging is collected (but not displayed) before it hangs.

Are you able to reproduce the issue from a python repl? If so, the instructions in this issue might yield some extra info that would be helpful (https://github.com/plotly/Kaleido/issues/36#issuecomment-756676527).

If possible, what would be most helpful would be a reproducible example consisting of:

Thanks!

gygabyte017 commented 3 years ago

Hi, unfortunately it is hard for me to give you what you asked, sorry about that :( because since it doesn't happen on my local pc while testing and it only happens on serverless containers spawned on EKS, andthe plotting happens after a lot of complex calculations involving other resources. However here's what I found out, hoping that might be somehow useful:

(Not sure about how could I access the frozen container and send an interrupt and interact with repl to provide more info).

Thanks

jonmmease commented 3 years ago

Thanks for this info @gygabyte017, that's helpful. Marking as a bug.

bhachauk commented 3 years ago

Same happening for the to_image call. Is there something can be done to avoid this ?

jonmmease commented 3 years ago

@Bhanuchander210, are you seeing this behavior being related to low memory as well?

bhachauk commented 3 years ago

@jonmmease I am not sure but i hope so. It is happening only on production environment randomly (as production env has other servers too.. so that i didn't track) After the change made scope._shutdown_kaleido() (as u suggested)... till now it not hanged...

jonmmease commented 3 years ago

Ok, thanks @Bhanuchander210.

jonmmease commented 3 years ago

Notes: Cross reference https://github.com/plotly/Kaleido/pull/43, which added some internal tracking of JavaScript heap usage, periodically clearing memory by refreshing the active page. If manually running scope._shutdown_kaleido() works around the issue, then I assume this internal page refresh to clear memory would do the same.

We're already refreshing the page when the heap reaches 50% of the maximum allowed. But I don't know whether this maximum limit (as returned by window.performance.memory.jsHeapSizeLimit) takes into account the available system memory. If not, then this might explain the trouble we're running into in memory constrained environments. Two ideas (not mutually exclusive):

  1. See if the chromium API provides a way to access the current system memory available, and incorporate that when deciding whether to refresh the page.
  2. Make this memory limit configurable through the API
LukasRauth commented 2 years ago

Hi, I get I get the same issue on my local pc on the first plotly figure i want to statically export (to a PDF), when I limit the python process's virtual memory "RLIMIT_AS" manually to for example 6GB.

Is there any progress on that topic?

Here is the code that I use to limit the virtual memory. (Thats something we need to do for that specific program to make sure that i wont get in conflict with the productive processes...)

import resource

def limit_memory():
    max_memory_mb = 6000
    soft_limit = max_memory_mb * 1024 * 1024
    resource.setrlimit(resource.RLIMIT_AS, (soft_limit, resource.RLIM_INFINITY))

Since it is the first export anyways, I cannot use the proposed workaround with scope._shutdown_kaleido()...

Im running on

kaleido==0.2.1
plotly==5.3.1
MaartenBW commented 4 months ago

Hi @gygabyte017

Did you ever resolve this issue? Did downgrading to v0.1.0 work to solve this issue?

Thanks.

gygabyte017 commented 4 months ago

Hi @MaartenBW, unfortunately I didn't, any version seems random, I don't believe there are reasons to prefeer 0.1.0 over 0.2.1 or whatever, it's just luck depending on the machine resources.

I managed to develop a ugly workaround, that is: 1) Increase the maximum ram on the containers, even though it wouldn't be necessary, and 2) the write_image is executed in a separate process with a timeout, if after i.e. 30 seconds it is still working, I kill the separate process and try again up to 5 tries.

In this way it's very rare that all the 5 tries fails, however it may still happen.

Now I want to try the solution described here, maybe it can work in a stable way? https://github.com/plotly/Kaleido/issues/110#issuecomment-2113369864

MaartenBW commented 4 months ago

@gygabyte017 Wow, thanks for your fast reply.