rivernews / appl-tracky-spa

An Application Tracking System to help job finders ease their out-of-control spreadsheet use tracking every job application record.
https://appl-tracky.shaungc.com
1 stars 0 forks source link

Automated org research tooling #100

Open rivernews opened 2 years ago

rivernews commented 2 years ago

Develop a micro service that can research the following:

Then, we can have some cronjob to POST data to appl-tracky, and display data in UI.

Scraping in Go

We need golang javascript scraper, this blog sum up some great scrapers.

rivernews commented 2 years ago

Debugging scraper

How to look at the browser?

rivernews commented 2 years ago

Reconsider what to scrape

LinkedIn does not allow scraping, at least within its private pages. If we are to collect employee info, it may be blocked my bot verification check. Of course caching and a best-effort mindset would help, but requires more work and less outcome - which affects our answer to the question: is it worth it going down this route? Because, you can always just visit the site.

But of course, review data (numeric and qualitative) is still relatively easy to retrieve.

Maybe a research hub could be feasible and useful - contains various sections, allow (and expect) some section left empty (due to network issue, page structure change, bot check, etc), while applying caching to minimize scraping. We imagine such research hub should be: