Closed rivernews closed 4 years ago
If SLK can programmatically spin up a container for both selenium server and scraper job, that would be awesome. Github AWS SDK for javascript. AWS CDK for EC2 npm page. Spec for each container:
One thing we want to ask is, VPC and subnet, does they cost anything? If not, we can use tf to create them beforehand; Otherwise, we may want to include them into the dynamically creating resource logic in SLK.
Looks like elastic approach is probably the most cost efficient one. The idea is basically:
Looks like via nodehs DO client, the node pool and its nodes are quite troubling. Node inside stucks at "provisioning". Why is that?
Another method is to use tf to create a separate node pool of 0 node count. Then SLK will just update "count" when need to scale up. Then, do polling to check node readiness.
default
service account in place. So I guess this is not needed at least in this case.nodeSelector
when provisioning K8s job.Configuration: 2vCPU, 4G RAM on selenium; SLK & rest of k8s infra on 1vCPU, 2G RAM. Running 1 concurrent scraper job
scraper job CPU is neglect-able.
Scraper job consumes around 177 MB. SLK uses 29 MB. We're not testing SLK because we're using local dev SLK.
Only running selenium. 1 scraper, uses around 0.8 CPU
1 scraper, uses up tp 900MB.
Services on Primary node like grafana getting a bit slow to response, so perhaps creating scraper job in worker node as well.
Several things we want to try:
All k8s scraper job & selenium running on worker node.
SLK: Initial: 60MB 4 sandbox processes: 175 MB (+115MB, 29MB/process) 10 sandbox processes: spark: 740MB (+680MB, 68MB/process); steady: 600MB (+540MB, 54MB/process)
Node memory usage: 2.1G/4G , around 50%. Estimated remaining capacity: safely +1G workload == at least 10 more sandbox == 20 total sandbox
Worker Node: Selenium, 4 sessions: 1G-3G (250-750MB/session), average 2.3G (575MB/session). Java scraper container, 4 jobs: 180-200MB/per job, total 720MB-800MB; Actual (incl. overlapping time): 8 k8s jobs concurrently and total of 1.7G.
Node memory usage: spark: 5G/8G, steady 4.5G/8G. Estimated remaining capacity: safely +2G workload == 2-3 more k8s jobs == total 6-7 k8s jobs.
[x] Using the dedicated node pools, let's see if we can push more concurrency from it. If not, we may want to fall back to standard shared droplet.
[ ] ๐we decide to move on until new info surface More issues emerge. Travis job can not locate review panel first, then redis connection got unstable, but managed to reconnect. Then things seem to stop there, no new progress reported. SLK then wait till 10 minutes timed out, then clean up and travis job got canceled. This happens again and again, quite steadily reproduced. Only appears in Travis job. Lack of log level so info also limited.
locate()
block, not parse()
block so not many progress publishing there. -> We're trying a few things like adding more chrome options, do additional redis publish progress, but no guratantees. Let's do a benchmark again (4 k8s / 4 travis) and see what's the result. Is definitely has something to do with concurrency, but we don't know why higher concurrency will cause the selenium timeout and redis reconnection.port
on liveness / readiness probe. You may switch to GoDaddy K8 Client, dig into the issue https://github.com/kubernetes-client/javascript/issues/444, or just disable the probe thing.While there're a lot space for improvement, we could set a final milestone here just to achieve two things:
Steps
[x] Test Hub-Node paradigm in production, also for benchmarking workload
[x] Configurable scale - currently we have 4 scrapers/k8 node; total of 8. We want to be able to arbitrarily set this number
curl
to access hub on a different namespace, 1) can resolve IP (there's one time initially can't, but after that it always resolves) 2) sometimes can get response from /wd/hub/status
, but a lot of times get port 4444: Connection refused
, which must be the main cause of our unreachable error - in this case this is not a big issue with Cilium (perhaps just initially that could not resolve host error), but more of sth on the hub side. If you use port forward, you don't see this issue.[ ] We may want to lock down to a working selenium hub / chrome node version.
[ ] Wrap up, merge PR, close this ticket
Cilium
, we may refer to debug network issue with Cilium first "Debugging and Monitoring DNS issues in Kubernetes".Things going well until we face the challenges here
As you can see, there's a problem with k8s node assignment algorithm. Before the whole cluster went nuts, you can see besides the 2+2 job-switching overlap, which is dangerous too and we can lower the memory request, but there's an additional job assigned to this node, making it have 5 jobs running concurrently at that moment.
Looks like we can't trust k8s's node assignment. We can lower the memory request so that a job doesn't claim such much memory at the beginning, only claim more when a job needs it - we can do that. But we don't have control over node assignment.
Unless k8s has some additional parameter to configure this, we will need to implement this node distribution algorithm by our own.
Two ways:
reset()
method.The semaphore objects are possibly created across different node processes. Got two issues
release()
doesn't work - it says it has no identifier.
Error: semaphore k8NodeResourceLock-0b63934f-b75c-4110-9489-ef220d6050cf has no identifier
nodeId is empty, did not acquire semaphore successfully
.Error: Lost semaphore for key k8NodeResourceLock-8d44fefc-fcec-4822-a145-5a55f1907a14
Some ideas why previous work succeeded:
acquire()
and release()
are executed in the same (child) process, it's ok.acquire()
session should be executed at a time for a (child) process.acquire()
, also store the local semaphore object. This will contain the (single) identifier.release()
be executed in the same (child) process. Then it will release with the correct identifier!During job switch, looks like k8s job does not immediately release memory.
scraper-job-1589753321112
already completed with scraper-job-1589753321112 1/1 2m2s 2m3s
, but its memory claim (pink) does not go down, even after a minute.
Ok, it finally goes down, after almost 5 minutes (pink).
Another example, at this time point, 748 (green) and 452 (yellow) already completed, but they didn't release their memory immediately, while new jobs are already incoming (orange and blue on top). Eventually they went down after 4-5 minutes, but that already cause a memory spike approaching node memory capacity 4G.
Related SO question to this. Keyword: gc, garbage collector, complete job pod resourceSome ideas to tackle this situation
Shape of one node Another Doesn't look like the overlapping issue is solved, but slightly better than previous. One guess is, the job "object" is deleted, but its pods remain there occupying resources.
Looks like it works perfectly now!
And another node
Recap the workflow
The core node pool given 1v2G is too small, unstable. We got network and nginx crashed, even if SLK nodes are fine. We then have to upgrade core node pool droplet size. At the end it just lose the point - we can just simply use a 4v8G droplet for both core and SLK. And that's a firm $40 per month. You can scale the entire k8 down by terraform - right, it's not perfect in that sense though. I don't know, maybe we need a separate k8 cluster to "manage" the k8 used for scraper.
Also, we would like to populate many companies currently missing in our database, include:
The scaling up and down is quite stable right now. Further automation would be really hard to maintain the same cost level around $20-40 monthly and can't really save us money while have the ability to scale up.
The current max capacity is 60 jobs at max, using core droplet size of 8G RAM. Could be either 4v8G, or memory-optimized 1v8G. Basic monthly bill is $40 at this memory size. But we figured out a way to bypass letsencrypt duplicated certificate, so we can always scale down the entire k8s cluster. Of course we still have to do this in terminal. Would be ideal to have a meta-service running, at least trigger travis job to trigger k8s provision / deletion. Perhaps heroku could be a good place to do this due to its free plan.
The scaling up cost is additional, using 2v4G machines.
This ticket also deals with the vision of this project. If we want to scale while paying reasonable bill for cloud provider, we need a more flexible way to run a kubernetes cluster. And using Kubernetes as a service definitely is limiting our way on that path.
Ideally something like AWS Fargate will do the best - if we can lower the cost when our cluster is at idle, then we can afford more concurrency while scraper jobs fire up.
AWS route: more RAM w/ cost efficiency
Several requirements about provisioning on AWS
Elastic scale route: save cost
Approach 1: AWS Fargate, or any other container service
Approach 2: K8 auto-scaling, K8 API
Approach 3: manual slack command in K8s
This approach is supposed to be the most feasible, fast to start one. No need to seek for other platform. This approach aims to create a low cost node running SLK - SLK has to be up all the time, in order to receive manual scale up / scale down command. That is, this approach uses SLK as a platform to manually scale up and down. This should save us cost and prevent keeping a large-cost node running w/o any scraper job present.
up
, then SLK triggers a travis build, which will run a terraform script to provision the resources for selenium server, on a dedicated node.rrr
orccc
.down
, so that it will trigger a travis build, which will run the same terraform script but usingdestroy
to destroy the node, along with all the selenium server and scraper jobs.