Open jiapenghe1996 opened 11 months ago
Unfortunately, I do not have any knowledge of how that tool works. Do you know if it is properly sending the API key for each request?
I would suggest reaching out to the maintainer for the tool.
From a quick glance, it looks like it's leveraging the API to gather a list of hearings, and then trying to brute force scrape things from the GovInfo website via an older link pattern (potentially from the days of FDsys when the site was hosted under https://www.gpo.gov/fdsys) as part of extract_nav
My recommendation would be to leverage the GovInfo API fully to get the list of hearing information and then following the package links, download content and metadata.
Note that for the CHRG collection, package results will not contain content - you will need to follow the granulesLink to get to the individual parts of a hearing
I am using https://github.com/rbshaffer/gpo_tools to scrape Congressional Hearings Scripts via GovInfo API. However, it appears that for each session of Congress, I am only avaialble to scrape the first 100 hearings. After 100 scrapes, I got the error "HTTPError: HTTP Error 401: Unauthorized."
I would like to ask if you know how to resolve this issue? Thank you!