Added scraping configs for 13 more websites, roughly half of the entire scraping config
Since some of these websites have 200-300 internships, the web scraper will likely collect the data on more than 1000 internships when it is run start to finish
I have not run it start to finish, but I tested each company individually to make sure that the data is correct
Updated the Internship model based on the data that I found on most websites
To-Do Post Merge
Create a command-line argument to pick which companies to scrape (since scraping everything will likely take 2-3 hours now)
For now, uncomment the testing code under the TODO in the scrape function scraper.py to choose which company/companies to scrape
Create a command-line argument to limit the number of internships scraped off of each website
For now, uncomment the testing code under the TODO in the scrape function in scrape_actions.py to choose how many internship links are visited
Right now, the code breaks if you modify the Internship models, we need to add a migration system, possibly based on Alembic
Right now, if we scrape properties that are not in the Internship model, that data is lost because it cannot be put into the database
We should add a column for additional data that does not fit into the model
Certain uncaught errors still crash the scraper, we need to make sure that we don't lose any data that we've scraped!
Improvements
To-Do Post Merge
scrape
functionscraper.py
to choose which company/companies to scrapescrape
function inscrape_actions.py
to choose how many internship links are visited