Open judgelord opened 8 months ago
Hi @judgelord, Thanks for developing this package! It's great! I'm trying to use it to scrape the congressional records and do some text mining for an academic article. Unfortunately, faced this same issue and have no idea how to fix it. Would appreciate any updates! Thanks again for developing this!
If you want to help, you could test out alternative web scraping packages in R. I can replace the rvest
method if another method works.
For sure! I’ll spend a few more hours on this next week. If I find another method, I’ll let you know!
Update: it seems that congress.gov is no longer blocking us
I wrote a python code to substitute the scraper.
I wrote a python code to substitute the scraper.
@Nuohai-muxi could you post a link to a repo?
FWIW
rvest::read_html("https://www.congress.gov")
works --- if there are errors with this package's functions returning 403 errors, it may be due to backslashes at the end of URLs, which seem to make congress.gov return a 403. I will investigate.
It looks like congress.gov is blocking whatever protocol
rvest
uses. I'm not sure what to do about this and don't have time to dig in right now, but I will try to figure it out.