Open rivernews opened 3 years ago
\n
inside class="..."
SendKeys()
failed. But site does randomly verify machine! Not sure if the user agent helps.OuterHTML
and chromedp.Nodes
can find it! Just WaitVisible
and SendKeys
did not work, just hangs.
input
is not visible? Getting cannot compute box model
.document.querySelector...
seems to work, but when chromedp
start interacting with it things error, mostly due to cannot compute box model.How to look at the browser?
LinkedIn does not allow scraping, at least within its private pages. If we are to collect employee info, it may be blocked my bot verification check. Of course caching and a best-effort mindset would help, but requires more work and less outcome - which affects our answer to the question: is it worth it going down this route? Because, you can always just visit the site.
But of course, review data (numeric and qualitative) is still relatively easy to retrieve.
Maybe a research hub could be feasible and useful - contains various sections, allow (and expect) some section left empty (due to network issue, page structure change, bot check, etc), while applying caching to minimize scraping. We imagine such research hub should be:
Develop a micro service that can research the following:
Then, we can have some cronjob to POST data to appl-tracky, and display data in UI.
Scraping in Go
We need golang javascript scraper, this blog sum up some great scrapers.