Scaling hubs behind Gridrouter

dhorbach commented 8 years ago

Hi

When scaling Selenium infrastructure in AWS we make decision based on number of new sessions count on Hub. This way we can have 1 static Hub and 0-X dynamic Node servers. Using Gridrouter - selection logic will return error in case of no any hubs are available. We are planning 1 server to host Gridrouter and 0-X dynamic servers hosting Hub+Nodes.

Is there any potential workaround for this?

levsha932 commented 8 years ago

Need more details here

Do you want to replace hab with gridrouter or you put gridrouter above hubs? What is the problem you want to solve with gridrouter?
What kind of decision you make based on number of new sessions now? Do you add additional nodes when hub is almost full?

I might guess you want to add next hub dynamicaly. My initial idea here is to put all your potential hubs in quota, but start them only when it's required. When gridrouter try to talk to hub that is not started, it gets network error and instantly try another hub so there wouldn't be any delay, only error message in log

dhorbach commented 8 years ago

We are planning to put Gridrouter above hubs. Our current setup is 1 Hub server and > 20 Node servers (on each Node server - approximately 12 Selenium nodes with 1 browser each). Thus number of Selenium nodes can be up to 300 in cluster. With this approach Selenium hub sometimes stops processing requests (I guess this is a know issue and one of reasons why Gridrouter was developed). Instead of this approach we want 1 Gridrouter and 0-20 Hub+Node instances which should improve stability (each Hub will have 12-20 Nodes as you recommend).
To save costs we don't run instances with Selenium nodes without test load. Only 1 Hub server is running. When new tests submitted to Hub - they wait on Hub until available nodes appear. Our decision logic calculates number of required instances to start based on number of new sessions (can be retrieved via hub API). Lets say 10 tests - start 1 server, 20 tests - start 2 servers etc. Without any session load - no additional servers are started (the only drawback is 2-3 minutes to wait for first server to appear after scaling)

We can workaround quotas with AWS by using pool of DNS records or ENI - this is not a problem. But the issue is that test sessions don't wait inside Gridrouter for new Hub to appear and they return immediately with error (no hub available). It makes behavior of Gridrouter different comparing to Hub in this case.

The only solution i see is to implement some sort of queue inside Gridrouter similar to Hub. Logic - iterate over hubs and if nothing is available - wait for some interval before next try.

API similar to Hub:

 GET /grid/api/hub  {"configuration":["newSessionRequestCount","slotCounts"]}

levsha932 commented 8 years ago

I see You are right about the reason we made gridrouter, and I understand now your case

Gridrouter doesn't have queue, we have plans to add it to gridrouter in the future. Actually we plan to add limits for concurrent sessions for quota and put all other requests to queue. Your solution here might be to be able to control dynamicaly that limit.

For now I see two posibilities:

Add retries on the tests side
Make bigger timeout on the hub side, so that part of the requests will wait and then get session there and part of the requests will wait enough for another hab start.

Probably you have to implement both for the case when actualy available hub was reached last.

seleniumkit / gridrouter

Scaling hubs behind Gridrouter #32