zalando / zalenium

A flexible and scalable container based Selenium Grid with video recording, live preview, basic auth & dashboard.
https://opensource.zalando.com/zalenium/
Other
2.39k stars 575 forks source link

Chrome is crashing #977

Open micheletest opened 5 years ago

micheletest commented 5 years ago

πŸ› Bug Report

selenium.common.exceptions.WebDriverException: Message: chrome not reachable and selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally

For about a week now I've been having chrome crashes in my kubernetes setup. I have been unable to get to the cause, and it happens to different tests each time so it's hard to reproduce. But it's also consistent, this will happen every single time I run a build in ci.

The chrome driver log indicates that X server went away for chrome not reachable errors. For the Chrome failed to start error, it seems to be this issue: https://github.com/zalando/zalenium/issues/861

I have tried --disable-dev-shm-usage to no effect. I've tried --no-sandbox and --disable-setuid-sandbox which didn't help. I tried to make a custom elgalu/docker-selenium to increase the SHM_SIZE but that didn't work because it needs to be privileged. The only thing that worked was to set my tests to headless. However this means I am getting no videos, so I consider this a temporary solution.

All the research I've done seems to indicate this is related to /dev/shm. I see this is mounted automatically, but is it run in privileged mode? If not can it be set to privileged for kubernetes? Or is there a way to increase the SHM_SIZE on the selenium nodes?

To Reproduce

I am using a kubernetes setup with image: dosel/zalenium:3.141.59k and elgalu/selenium:3.14.0-p22. I am using these requests/limits:

            - name: ZALENIUM_EXTRA_JVM_PARAMS
              value: -d64 -Xms4G -Xmx8G
            - name: ZALENIUM_KUBERNETES_CPU_REQUEST
              value: 1000m
            - name: ZALENIUM_KUBERNETES_CPU_LIMIT
              value: 2000m
            - name: ZALENIUM_KUBERNETES_MEMORY_REQUEST
              value: 1Gi
            - name: ZALENIUM_KUBERNETES_MEMORY_LIMIT
              value: 2Gi

Expected behavior

Chrome doesn't crash.

Test script reproducing this issue (when applicable)

none yet

Environment

OS Image:                   Debian GNU/Linux 8 (jessie)
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.3.2
 Kubelet Version:            v1.9.10
 Kube-Proxy Version:         v1.9.10
micheletest commented 5 years ago

Also want to note that the host's /tmp averages about 16g and /dev/shm about 40g so there should be plenty of space on the host.

derom commented 5 years ago

Same here. We first noticed it on May 15.

diemol commented 5 years ago

@micheletest sorry for the late reply.

Would it be possible to see the logs Zalenium and the pod where the browser was running?

micheletest commented 5 years ago

Hi @diemol. See the attached gist - I've included all the pod logs that seemed relevant plus an excerpt from the zalenium log where it gets the session. I do have trouble correlating what's going on in the zalenium log with what's happening in kubernetes, so hopefully I grabbed the correct information. (if there is a common session id or something, please let me know). Also, I can grab more info if this isn't enough.

https://gist.github.com/micheletest/64d3665fd20f0d504e025d8f28630272

berezovskij commented 5 years ago

Is there any news? If the problem is not solved I will have to give up the Zalenium!

diemol commented 5 years ago

@micheletest The key of the issue lies here: https://gist.github.com/micheletest/64d3665fd20f0d504e025d8f28630272#file-xmanager-stderr-log

Somehow elgalu/selenium cannot find a free display to start XVFB, could you please dig into the VM where this is running and see if there are some displays locked? It is always quite tricky when it comes to that, because I've seen some VMs where this happens often and other ones where this almost never happens. Most of the time, things work well in Ubuntu as the host OS, perhaps try Ubuntu and let us know if it gets better.

@berezovskij would be sad if you need to give up on Zalenium, I invite you to have a different perspective and help us find a way to fix the issue instead of pointing out that it has not been solved. As many other open source projects, we have limited time and resources to tackle any existing issues, we rely a lot in the community to help us getting Zalenium to a better state.

micheletest commented 5 years ago

@diemol this is still happening to me. I'm still using zalenium headless, which mostly works, but I'd love to turn on video again.

Can you explain how elgalu/selenium interacts with the host X server i.e., does this mean that this docker container connects to the host's X server? In our case, we run this Docker image on a Kubernetes cluster with Debian nodes on AWS EC2.

diemol commented 4 years ago

@elgalu can you please help with @micheletest's question?

elgalu commented 4 years ago

Hi, no, it doesn't interact with the host X server, we use xvfb-run

elgalu commented 4 years ago

ALL please consider to start sponsoring we do this in our free time;)

luisxiaomai commented 4 years ago

We met the same page crash issue ,how to increase the selenium container shm_size?

diemol commented 4 years ago

If some versions of Chrome work and others don't, it all points out to be a Chrome/ChromeDriver issue, right?