usgpo / api

services to access govinfo content and metadata
https://api.govinfo.gov
Other
177 stars 57 forks source link

Bulk data API back to 103rd Congress #142

Closed eltrompetero closed 7 months ago

eltrompetero commented 8 months ago

Are there any plans to add the older congressional sessions with digital text to the Bulk Data API?

Thanks!

jonquandt commented 8 months ago

Good morning,

Is there a specific collection that you are interested in?

Are you referring to the Bulk Data Repository or the API itself?

eltrompetero commented 8 months ago

Sorry, I meant the Bulk Data Repo.

eltrompetero commented 8 months ago

Also, if it helps to be more specific, I meant the 103rd and later congressional sessions, which are available on congress.gov. It would help to have these all together on the bulk data repo, even if in principle one can also get them separately.

There is a much larger question here of whether or not the older sessions before 103 could also be standardized. I saw that these are only available on PDF, but formatting choices made in the conversion could matter for large-scale textual analysis, and it would be ideal to have the conversion to XML be standardized.

eltrompetero commented 7 months ago

Pinging this once more after I've written code to use the congress.gov API to download the relevant files. Unfortunately, it is very slow given the 1,000 requests per hour limit. Also, the server is sometimes non-responsive.

I think including the older sessions in the Bulk Data Repository would be very helpful.

Thanks.

jonquandt commented 7 months ago

@eltrompetero - Are you speaking of earlier Bill texts or some other kind of document? From congressional sessions, I'm wondering if you mean copies of the Congressional Record.

I am not currently aware of any plans to perform conversion to XML for sessions of Congress earlier than what is already available.

As an alternative to the congress.gov API, you may be interested in downloading Congressional content from the GovInfo API, which has a higher limit than 1000 requests an hour. Congress.gov receives content from GovInfo for their usage.

https://api.govinfo.gov/collections/BILLS/2023-01-01T00:00:00Z?offsetMark=*&pageSize=100&api_key=DEMO_KEY

eltrompetero commented 7 months ago

Ah, I mean congressional bills. But thanks for the info.

Ok, great. I'm finding api.govinfo much more amenable.

Thanks.