Closed carterthayer closed 3 years ago
I also occasionally get this error when trying different memory settings.
[ERROR] ValueError: Failed to start Kaleido subprocess. Error stream:
--
[0224/183020.408878:WARNING:resource_bundle.cc(435)] locale_file_path.empty() for locale
prctl(PR_SET_NO_NEW_PRIVS) failed
prctl(PR_SET_NO_NEW_PRIVS) failed
[0224/183021.420983:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 1 time(s)
prctl(PR_SET_NO_NEW_PRIVS) failed
[0224/183021.530107:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 2 time(s)
[0224/183021.530652:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 3 time(s)
[0224/183021.531130:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 4 time(s)
[0224/183021.603838:ERROR:network_service_instance_impl.cc(262)] Network service crashed, restarting service.
[0224/183021.604498:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 5 time(s)
prctl(PR_SET_NO_NEW_PRIVS) failed
[0224/183021.628785:ERROR:network_service_instance_impl.cc(262)] Network service crashed, restarting service.
prctl(PR_SET_NO_NEW_PRIVS) failed
[0224/183021.644413:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 6 time(s)
[0224/183021.644502:FATAL:gpu_data_manager_impl_private.cc(439)] GPU process isn't usable. Goodbye.
#0 0x5651f7ce8f89 base::debug::CollectStackTrace()
#1 0x5651f7c578e3 base::debug::StackTrace::StackTrace()
#2 0x5651f7c68005 logging::LogMessage::~LogMessage()
#3 0x5651f697c437 content::(anonymous namespace)::IntentionallyCrashBrowserForUnusableGpuProcess()
#4 0x5651f697a3ce content::GpuDataManagerImplPrivate::FallBackToNextGpuMode()
#5 0x5651f69790ff content::GpuDataManagerImpl::FallBackToNextGpuMode()
#6 0x5651f69824a0 content::GpuProcessHost::RecordProcessCrash()
#7 0x5651f67d7603 content::BrowserChildProcessHostImpl::OnProcessLaunchFailed()
#8 0x5651f6832963 content::internal::ChildProcessLauncherHelper::PostLaunchOnClientThread()
#9 0x5651f6832ba5 base::internal::Invoker<>::RunOnce()
#10 0x5651f7c9802b base::TaskAnnotator::RunTask()
#11 0x5651f7ca8b3e base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl()
#12 0x5651f7ca88d0 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork()
#13 0x5651f7d03dc9 base::MessagePumpLibevent::Run()
#14 0x5651f7ca90c5 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run()
#15 0x5651f7c850ee base::RunLoop::Run()
#16 0x5651f67f18a4 content::BrowserProcessSubThread::IOThreadRun()
#17 0x5651f7cbe997 base::Thread::ThreadMain()
#18 0x5651f7cf88fe base::(anonymous namespace)::ThreadFunc()
#19 0x7f1326b2940b start_thread
#20 0x7f132572a09f __GI___clone
Task trace:
#0 0x5651f6832806 content::internal::ChildProcessLauncherHelper::PostLaunchOnLauncherThread()
#1 0x5651f683229c content::internal::ChildProcessLauncherHelper::StartLaunchOnClientThread()
#2 0x5651f6cddaac content::VizProcessTransportFactory::ConnectHostFrameSinkManager()
#3 0x5651f82c8b06 mojo::SimpleWatcher::Context::Notify()
#4 0x5651f6cddaac content::VizProcessTransportFactory::ConnectHostFrameSinkManager()
Task trace buffer limit hit, update PendingTask::kTaskBacktraceLength to increase.
Received signal 6
#0 0x5651f7ce8f89 base::debug::CollectStackTrace()
#1 0x5651f7c578e3 base::debug::StackTrace::StackTrace()
#2 0x5651f7ce8b25 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7f1326b337e0 (/usr/lib64/libpthread-2.26.so+0x117df)
#4 0x7f1325670c20 __GI_raise
#5 0x7f13256720c8 __GI_abort
#6 0x5651f7ce7a85 base::debug::BreakDebugger()
#7 0x5651f7c684a2 logging::LogMessage::~LogMessage()
#8 0x5651f697c437 content::(anonymous namespace)::IntentionallyCrashBrowserForUnusableGpuProcess()
#9 0x5651f697a3ce content::GpuDataManagerImplPrivate::FallBackToNextGpuMode()
#10 0x5651f69790ff content::GpuDataManagerImpl::FallBackToNextGpuMode()
#11 0x5651f69824a0 content::GpuProcessHost::RecordProcessCrash()
#12 0x5651f67d7603 content::BrowserChildProcessHostImpl::OnProcessLaunchFailed()
#13 0x5651f6832963 content::internal::ChildProcessLauncherHelper::PostLaunchOnClientThread()
#14 0x5651f6832ba5 base::internal::Invoker<>::RunOnce()
#15 0x5651f7c9802b base::TaskAnnotator::RunTask()
#16 0x5651f7ca8b3e base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl()
#17 0x5651f7ca88d0 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork()
#18 0x5651f7d03dc9 base::MessagePumpLibevent::Run()
#19 0x5651f7ca90c5 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run()
#20 0x5651f7c850ee base::RunLoop::Run()
#21 0x5651f67f18a4 content::BrowserProcessSubThread::IOThreadRun()
#22 0x5651f7cbe997 base::Thread::ThreadMain()
#23 0x5651f7cf88fe base::(anonymous namespace)::ThreadFunc()
#24 0x7f1326b2940b start_thread
#25 0x7f132572a09f __GI___clone
r8: 0000000000000000 r9: 00007f131ce5e3c0 r10: 0000000000000008 r11: 0000000000000246
r12: 00007f131ce5f688 r13: 00007f131ce5e660 r14: 00007f131ce5f690 r15: aaaaaaaaaaaaaaaa
di: 0000000000000002 si: 00007f131ce5e3c0 bp: 00007f131ce5e610 bx: 0000000000000006
dx: 0000000000000000 ax: 0000000000000000 cx: 00007f1325670c20 sp: 00007f131ce5e3c0
ip: 00007f1325670c20 efl: 0000000000000246 cgf: 002b000000000033 erf: 0000000000000000
trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]
Calling _exit(1). Core file will not be generated.
Traceback (most recent call last): File "/var/task/my_app/mymodule.py", line 134, in send_report graph_img = fig.to_image(format="png") File "/var/lang/lib/python3.8/site-packages/plotly/basedatatypes.py", line 3743, in to_image return pio.to_image(self, *args, **kwargs) File "/var/lang/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 131, in to_image img_bytes = scope.transform( File "/var/lang/lib/python3.8/site-packages/kaleido/scopes/plotly.py", line 103, in transform response = self._perform_transform( File "/var/lang/lib/python3.8/site-packages/kaleido/scopes/base.py", line 280, in _perform_transform self._ensure_kaleido() File "/var/lang/lib/python3.8/site-packages/kaleido/scopes/base.py", line 188, in _ensure_kaleido raise ValueError(message)
Hi @carterthayer,
Huh, I don't have any idea off hand why lambda would behave differently here.
fig.to_image
or pio.to_image
) or directly (using scope.transform
)?--disable-gpu
which should be using pure software rendering.I'm just doing a regular fig.to_image(format="png")
on Kaleido version 0.1.0
.
I am not customizing any of the Chromium flags. Do you have any you suggest that I might try?
I don't have any specific chromium flag recommendations. The current set of flags was chosen to try to make kaleido work by default inside docker containers. One user saw an improvement using the --single-process
flag (https://github.com/plotly/Kaleido/issues/45), but this was more of a memory issue I think.
Could you try the following from the failing configuration to see if any more logging info is available?
import plotly.io as pio
try:
fig.to_image()
except:
print(pio.kaleido.scope._std_error.getvalue().decode("utf8"))
Could you try the following from the failing configuration to see if any more logging info is available?
import plotly.io as pio try: fig.to_image() except: print(pio.kaleido.scope._std_error.getvalue().decode("utf8"))
[0225/004752.748704:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 5 time(s)
--
prctl(PR_SET_NO_NEW_PRIVS) failed
[0225/004752.802197:ERROR:network_service_instance_impl.cc(262)] Network service crashed, restarting service.
[0225/004752.802795:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 6 time(s)
prctl(PR_SET_NO_NEW_PRIVS) failed
Ok, it does look like this GPU crashing error is the root issue here.
We already set the --disable-gpu
chromium flag by default, so I'm not sure why this would be happening. My only idea at the moment would be to try adding all of the --disable-gpu-*
flags (e.g. --disable-gpu-compositing
, see https://peter.sh/experiments/chromium-command-line-switches/) to see if that makes any difference.
Here's an SO post related to using chromium headless on Lambda: https://stackoverflow.com/questions/65429877/aws-lambda-container-running-selenium-with-headless-chrome-works-locally-but-not. And the accepted solution includes using the --disable-gpu-sandbox
and --single-process
flags.
It worked! One of those flags did anyway.
I'll do some trial and error to figure out which one it was and update the issue for others who come along and find this.
Awesome! Yeah, we'd really appreciate it if you could narrow down which flag helped. If it doesn't look like it would cause any issues in other use cases, it would be great to add it to the default set.
It was --single-process
that fixed it for me.
I added
import plotly.io as pio
pio.kaleido.scope.chromium_args += ("--single-process",)
Thanks for your help @jonmmease
Ok! thanks letting us know.
Looks like the need for --single-process
on Lambda is a known situation:
As we discussed a bit in https://github.com/plotly/Kaleido/issues/45, the --single-process
flag isn't recommended for use beyond debugging (https://www.chromium.org/developers/design-documents/process-models). So we shouldn't make it the overall default. Would you be willing to do a couple more experiments to see if either of the other process flags listed in https://www.chromium.org/developers/design-documents/process-models also fix the issue? In particular --process-per-site
and --process-per-tab
? These aren't listed as unsafe, and so they would be candidates for adding as defaults if they make a different in this context.
Alternatively, I'd be open to checking environment variables to add --single-process
specifically when running on Lambda (https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html) if we determine it's the only way to get things working.
Thanks!
Update: I tried --process-per-site
and --process-per-tab
on AWS Lamba and both fail with the GPU crash error messages as described in this issue.
https://github.com/plotly/Kaleido/pull/76 adds the --single-process
flag to the default set of chromium flags when kaleido detects that it is running on AWS lambda (based on the presence of the LAMBDA_RUNTIME_DIR
environment variable).
Automatic AWS Lambda detection released in 0.2.0.
Hey @jonmmease
I am facing the same issue, and using pio.kaleido.scope.chromium_args += ("--single-process",) seams to fix the issue, i believe detection of lambda server isnt always working.
Can you look into it? let me know which extra details you would like to know.
Using kaleido==0.2.1 plotly==5.24.1
When executing: import io
import plotly.graph_objects as go import plotly.io as pio fig.write_image(buffer, format="png") Errors recieved:
builtins.ValueError: Failed to start Kaleido subprocess. Error stream:
[1017/090341.380752:WARNING:resource_bundle.cc(431)] locale_file_path.empty() for locale prctl(PR_SET_NO_NEW_PRIVS) failed [1017/090341.967668:FATAL:zygote_communication_linux.cc(255)] Cannot communicate with zygote
Received signal 6
r8: 0000000000000000 r9: 00007ffceb148160 r10: 0000000000000008 r11: 0000000000000246 r12: 00007ffceb148c00 r13: 00007ffceb1494f0 r14: 00007ffceb149500 r15: 00007ffceb149508 di: 0000000000000002 si: 00007ffceb148160 bp: 00007ffceb1483b0 bx: 0000000000000006 dx: 0000000000000000 ax: 0000000000000000 cx: 00007ff5940a4ca0 sp: 00007ffceb148160 ip: 00007ff5940a4ca0 efl: 0000000000000246 cgf: 002b000000000033 erf: 0000000000000000 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000 [end of stack trace] Calling _exit(1). Core file will not be generated.
I am trying to generate a plotly chart and save it as a PNG in memory. I have my application packaged up and running in Docker locally. I am then using AWS Lambda to run this container, but fail when the save image command runs.
I get the error:
It seems weird to me that this would be working when I run it on Docker locally, but not on lambda. Do you have any ideas or guidance into what is happening with Kaleido so I can figure out why it is failing?