kaylinah / scrape-alerts

Get notified when scraped data from a web page has changed
0 stars 3 forks source link

Make HTTP request (1) #3

Open sarah-tully opened 6 years ago

sarah-tully commented 6 years ago
wickkidd commented 6 years ago

Not a bad read on scraping libraries: https://elitedatascience.com/python-web-scraping-libraries

sarah-tully commented 6 years ago

Overall project goal:

scrape https://www.foreign.senate.gov/hearings/ for documents uploaded as “related files” in business meetings.

Assignment objective:

choose a library/framework[https://www.quora.com/What-is-the-difference-between-a-library-framework-and-a-language] that would “best” serve our goal

Define “best”:

  1. Uses python
  2. Is appropriate for building a focused crawler
  3. Can handle redirects
  4. Is appropriate for beginners

Findings:

Not gonna work:

Will work:

Requests documentation Selenium documentation Scrapy documentation

Examples:

Overview/use-case example for scrapy scraping with Requests and Beautiful Soup

Other resources:

FINAL THOUGHTS:

there are lots and lots of ways to do this. Based on our goals, I think either Requests + Beautiful Soup or Scrapy would be best suited for our needs. Scrapy seems to be more the more bougie option - less bare-boned than Requests + BS4. I think for a beginning project like this, perhaps using a less built out method than scrapy would be useful, so we can write our own code in response to problems, rather than finding a pre-made answer that we might find with Scrapy.