Open flavialetgo opened 5 years ago
@flavialetgo the only relevant change has been the release of Chrome 75, nothing else that I am aware.
Please let us know when you find a clear way on how to reproduce this, so we can have a look. Thanks,
I'm experiencing the same issue lately
For me it seems to be that after a certain amount of tests run, the hub gets unstable to the point where either node containers shut down in the middle of a test (no sleeps or extended waits used), in which case I get the connection refused error. Or the instability of the hub results in no more nodes being started. If I go to the grid console/ admin live preview dashboard, I can see that the hub is up, but no nodes are connected and no more getting started/created upon new test requests.
Might be related but I haven't figured out yet what causes the instability in the first place. Usually restarting zalenium does the trick for a while, until it gets unstable again.
I think this issue is the same one as this: https://github.com/zalando/zalenium/issues/560. Not sure if this has been resolved on either selenium side or maybe at deeper level.
@flavialetgo the only relevant change has been the release of Chrome 75, nothing else that I am aware.
Please let us know when you find a clear way on how to reproduce this, so we can have a look. Thanks,
I don't have a clear way to reproduce this yet. Under the same environment conditions the issue is not always reproducible but there's a high chance of occurrence. I'll keep you posted
@diemol, the following could be a way to reproduce:
1- Have a set of tests/feature files to be executed in parallel, for example, 10 tests/feature files. 2- Have some of them to throw a timeout. In my case (webdriverIO + cucumber), the step definition is set to X milliseconds. Force the test to time out at some point. 3- Execute the tests in parallel, one session per node. Have fewer nodes than the number of tests so they can be queued. For example, if you have 10 tests, indicate zalenium to allow up to 5 nodes.
Please take into account that it might be necessary to repeat the test a few times until the java.net.ConnectException: Connection refused
is seen.
I had CI job with ~60 tests and that error was happening once a while, when I split that job into 6 with 4-20 tests in each, it happens 3-4 times out of 6... even in job where I have only 4 tests.
Hi @diemol I am also facing this issue, and am able to replicate this issue quite frequently. Steps to reproduce : 1) Have a test suite, which has multiple test methods spread across the classes 2) have multiple testng threads as well as multiple zalenium nodes. Let's say 5 testng thread and 5 zalenium nodes. 3) Monitor live console , that post executing 1st test case it kills the session and the node also gets killed and a new session get created.
So post when testng tries further to execute the steps, it's not able to find the exact session and throws the exception.
Command to start zalenium hub : docker run --rm -ti --name zalenium --hostname zalenium_hub -p 4444:4444 -e zalenium_no_proxy="localhost,127.0.0.1,172.17.0.*" -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/videos:/home/seluser/videos --privileged dosel/zalenium start --maxTestSessions 5 --desiredContainers 5
Hi @kumar1210, I have same problem. Could you solve it?
Hi @kumar1210, I have same problem. Could you solve it?
Not able to solve it. But i found some workaround. Not sure with the reasons yet. I am using multiple machine to run the grid. So before starting the hub machine, i run the nodes in other machine and then start the hub. So when hub starts, it starts registering the all nodes subscribed to it. Zalenium route the test cases to the nodes in the same order nodes are registered. So all my test cases are getting executed in slave nodes and i am not seeing that exception anymore.
Not sure, i am guessing it might be because the different way we are starting the nodes.
To start the hub : docker run --rm -ti --name zalenium --hostname zalenium_hub -p 4444:4444 -e zalenium_no_proxy="localhost,127.0.0.1,172.17.0.*" -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/videos:/home/seluser/videos --privileged dosel/zalenium start --desiredContainers 5
To start the node in other machine :
docker run -d --name hostname
_node_0 -h hostname
_node_0 -p 5550:5555 -e HUB_HOST=
@flavialetgo I too facing the same issue in the latest version as well as the version 3.14.0g
. Do you know any workaround for the k8s deployment?
π Bug Report
I've a suite of web tests implemented with Webdriverio 4 which I'm running on a Zalenium image. I am executing 14 tests in parallel in Chrome and I randomly get
java.net.ConnectException: Connection refused
. The same suite of tests have been running for a while with no issues by end of May, 2019.The hub logs the following exception:
Relevant Zalenium logs while experiencing the Connection refused (notice node xx.xx.xx.166 behavior):
I'm aware of a reported issue in the past and its fix (https://github.com/zalando/zalenium/issues/970). However, is it possible I am facing that one?
To Reproduce
Kick off tests in parallel to be executed in different nodes. Unfortunately I was not able to distinguish a specific scenario of occurrence yet to give more details.
Steps to reproduce the behavior (including the docker command/docker-compose/Kubernetes manifests to start Zalenium):
Expected behavior
Tests should run with no errors
Actual behavior
Random
java.net.ConnectException: Connection refused
while parallel tests.Environment
ubuntu:xenial-20181113 Zalenium Image Version(s): 3.141.59r
Hub config values: