Open sarah-tully opened 6 years ago
Not a bad read on scraping libraries: https://elitedatascience.com/python-web-scraping-libraries
scrape https://www.foreign.senate.gov/hearings/ for documents uploaded as “related files” in business meetings.
choose a library/framework[https://www.quora.com/What-is-the-difference-between-a-library-framework-and-a-language] that would “best” serve our goal
Not gonna work:
Will work:
Requests documentation Selenium documentation Scrapy documentation
Overview/use-case example for scrapy scraping with Requests and Beautiful Soup
there are lots and lots of ways to do this. Based on our goals, I think either Requests + Beautiful Soup or Scrapy would be best suited for our needs. Scrapy seems to be more the more bougie option - less bare-boned than Requests + BS4. I think for a beginning project like this, perhaps using a less built out method than scrapy would be useful, so we can write our own code in response to problems, rather than finding a pre-made answer that we might find with Scrapy.