Webscraping Utils (Optional, if we have time)

An issue I ran into when webscraping is being blocked by the website we are trying to scrape. When webscraping hundreds of jobs, you will get blocked for a bit (couple of minutes). To fix this, we need to implement proxy rotations. Another issue we run into is the querying speed. We can implement a puppeteer library that webscrapes links in parallel (similar to multithreading)

1) Puppeteer Webscrape Proxy 2) Puppeteer Parallel Webscraping, https://github.com/thomasdondorf/puppeteer-cluster

Will need to do the following codes in parallel: https://github.com/RTX-Banana/DummyTHICCC/blob/webscraper/backendium/webscrape/SimplyHired.js#L37-L58 and https://github.com/RTX-Banana/DummyTHICCC/blob/webscraper/backendium/webscrape/Indeed.js#L40-L69

In addition to webscraping, we will use this api. Rather than constantly webscraping every user search, we will webscrape jobs weekly and store in our database using CRON job. Weekly, the CRON job will delete all the stored jobs in the database and repopulate it with updated jobs.

realCalvin / CmpEndium

Webscraping Utils (Optional, if we have time) #27