rivernews / slack-middleware-server

This server act as a middleware to communicate with Slack API.
1 stars 1 forks source link

Alternatives to Selenium #82

Closed rivernews closed 4 years ago

rivernews commented 4 years ago

Selenium chrome will cause random page crash, and we suspect that it has to do with when we are executing javascript in browser. SO posts indicate it has to do with /dev/shm, but even if we raise host memory, or disable dev shm, it still happens, while relatively rare, like 1 out of 10-20 jobs.

Also Selenium eats up lots of memory.

Alternative scraper needs to meet requirements below:

rivernews commented 4 years ago

For the page crash issue, especially when locating / finding element or especially executing javascript in browser, we just set retry job to 2~3. And seems that by doing this we are able to get all jobs complete w/o escalation!


The high memory meter is still an issue, and indeed little can be done to improve this in Java. That said, because of IP throttling on the glassdoor side, up to 3 scrapers per node is proper, and more than that running on the same node will get us blocked, so I guess the current setup is actually the best and balanced.