xilution / xilution-selenium-grid

A Selenium Grid that Runs in AWS ECS Fargate.
https://www.xilution.com
Other
24 stars 14 forks source link

Hub can't reach nodes #7

Closed theredfish closed 6 years ago

theredfish commented 6 years ago

Hi,

It seems the hub cannot reach nodes. The issue may be related to port mapping and host information. A similar project with a stack based on ECS EC2 instances is using a port mapping script written in python (aws SDK boto3).

When nodes are connected to the hub, they send their location with remoteHost. We can use this configuration to allow the hub to know nodes location. By default the localhost is set.

Are you experiencing the same issue?

I'm trying to find a way to get the ephemeral port but the ECS introspection doesn't seem to give that information :

Containers: [{
         ....

          NetworkBindings: [],
          NetworkInterfaces: [{
              AttachmentId: "ba8fa862-3447-4689-a8b5-665c00b1176c",
              PrivateIpv4Address: "172.31.60.17"
            }],
          ...

        }]
tbrunia commented 6 years ago

Hey, @theredfish, thanks for giving the repo a try. I'm not experiencing the same issue. I provision the stack when I need it on my AWS account and delete it when I'm done. It's works perfectly fine for me. Did you modify the Cloud Formation template or some other aspect of the repo?

If you see an opportunity to improve the project, PRs are welcome. Thanks!

theredfish commented 6 years ago

Hello thank you for your answer, Yes i removed the RecordSet part, but that's all. With a personnal stack I'm experiencing the same issue.

You will find attached a screenshot showing the issue. Furthemore, only one node seems to be connected. Some interesting hub logs :

13:02:58.415 INFO - Marking the node http://169.254.172.42:5555 as down: cannot reach the node for 2 tries
13:02:48.335 INFO - Registered a node http://169.254.172.42:5555
13:02:47.733 INFO - Unregistering the node http://169.254.172.42:5555 because it's been down for 60016 milliseconds
:02:47.733 WARN - Cleaning up stale test sessions on the unregistered node http://169.254.172.42:5555
[...]

Just as you can see, nodes can't be reached from the hub and are unregistered. The platform version for Fargate is 1.1.0. Nodes are sharing the same port in their configuration... and the same ID (id: http://169.254.172.42:5555). It's why there is only one connected node I think, they aren't unique.

@tbrunia Could you run the following CLI command and return me the value of networkBindings? Mine is empty, i suspect a bug at this point.

aws ecs describe-tasks --cluster selenium-grid --tasks $TASK_ID

connection_refused

alexdruck commented 6 years ago

@theredfish I'm getting the same issue. Like you I only erased the RecordSet. Did you ever solve this? I'm wondering if this will work with just one load balancer...

I'm going to try and port some of https://github.com/RetailMeNotSandbox/ecs-selenium/blob/master/cloudformation/ecs-selenium.cfn.yml over and see if I can get it to work.

theredfish commented 6 years ago

Hey @alexdruck I didn't solve this. I think we can't get port mapping with Fargate, due to the awsvpc network mode. The describe-tasks command doesn't return any information about bindings.

Finally I continued with ECS/EC2 which is more flexible and less expensive. Good luck and let me know if you find any suitable solution :)

alexdruck commented 6 years ago

@theredfish I think I did find a solution to this. We don't really need the port mapping; we just need the IPs of the task since there really isn't dynamic port mapping going on here like it is in normal ECS at least that we can see.

So what I did to get every node connected to and actually reachable from the hub was tell the hub which IP each task/container was at. I used stuff from here https://groups.google.com/forum/#!topic/selenium-users/PYX1fYqNGqc to get the general idea.

capture1 capture

So you need to create your own docker file that FROM's whatever version of the selenium node you want (I'm using selenium/node-chrome:3.12.0-americium). You need to apt-get curl and jq (or whatever other way you want to parse json).

So get those 2 things on it and then COPY over a script in the Dockerfile like

#!/bin/sh

PRIVATE=$(curl -s http://169.254.170.2/v2/metadata | jq -r '.Containers[1].Networks[0].IPv4Addresses[0]')
export REMOTE_HOST="http://$PRIVATE:5555"

/opt/bin/entry_point.sh

This does hardcode in the 5555 port, but I'm only going to run it off that port so I didn't think it was that big of deal. That curl is to this https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint.html#task-metadata-response The REMOTE_HOST env var is built in selenium grid thing that will let you tell the hub where to look for the node at. Then just run that as a CMD at the end. Upload that image to an ECR repo, call that in your CF TaskDefinition, and you'll be good to go.

theredfish commented 6 years ago

Hi @alexdruck, Good job 👍 and yes it's true you can set the same port to each node since they are unique with their IPV4 address. Nodes will have a unique id based on UNIQUE_IP:SAME_EXPOSED_PORT.

You got it ;) ! So i guess i can close this issue now. Thanks!

julgon commented 6 years ago

Hi Guys, A simpler solution for the ones that just don't want to create any new docker image, and just use the base one.

----- FOR SELENIUM NODES --------- You can just add this configuration into the ECS task definition's environment section: EntryPoint: sh,c

Command: export REMOTE_HOST=http://\`ip addr show eth0 | grep -oP "inet \K\S[0-9.]+"`:$NODE_PORT;printenv | grep REMOTE ; /opt/bin/entry_point.sh;`

image

image

image

----- FOR SELENIUM HUB --------- Command: export GRID_HUB_HOST=`networkctl status eth0 | grep -oP " Address: \K\S[0-9.]+"`; /opt/bin/entry_point.sh;`

image image

theredfish commented 6 years ago

@julgon thank you for the tip! Very useful! Can I suggest you to make a PR to update the documentation / README? (cc @tbrunia )

gdegani commented 5 years ago

@julgon I'm having the same problem. Could you please share the modified template.yml file? I tried to use your solution but when I add the environment variables I receive the following error: Template error: IAM role selenium-grid-ECSTaskExecutionRole-1BMNXNVEIIHQY doesn't exist

Xin00163 commented 5 years ago

Has anyone tried to add the command provided by @julgon in terraform? It seemed that terraform doesn't like string interpolation.

jdorn commented 5 years ago

Slight modification required to get @julgon solution working on Fargate. The original answer had a couple typos and forgot to install the ip command before using it.

In your ECS task definition, the chome node container should have the following settings:

Standard

Image: selenium/node-chrome:latest

Port Mappings: 5555 tcp

Environment

Entry Point: sh,-c

Command: sudo apt-get update -qqy;sudo apt-get -qqy --no-install-recommends install iproute2;export REMOTE_HOST=http://`ip addr show eth0 | grep -oP \"inet \\K\\S[0-9.]+\"`:5555;printenv | grep REMOTE; /opt/bin/entry_point.sh