usgpo / api

services to access govinfo content and metadata
https://api.govinfo.gov
Other
183 stars 58 forks source link

Why are the Related bills getting removed and getting added back very frequently #147

Open prajwal19988 opened 8 months ago

prajwal19988 commented 8 months ago

Hi Admin, I hope you are doing well. I am from Quorum. We use the API to work on the bills and the related data and this API has been an excellent and extensive resource. However, lately, In the xlsx data dump we are observing that the related bills section is being updated very frequently . Many New bills are getting added and many other previous entries are getting removed .

Can you please let us know why we are noticing such behaviour and if there is a way forward for it ... Thank you !

jonquandt commented 8 months ago

Can you clarify what services and resources you are querying? Examples of requests will help us investigate. Thanks.

prajwal19988 commented 8 months ago

Definitely. If We download the bills' zip and check for individual xml files : https://www.govinfo.gov/bulkdata/BILLSTATUS/118/hr/BILLSTATUS-118-hr.zip We are finding the difference too often..on a daily basis...even though the bill action is quite old. example : BILLSUM-118hr2670.xml

jonquandt commented 8 months ago

Thanks for this. Are you using the API to pull the BILLSTATUS files - e.g. calling https://api.govinfo.gov/collections/2024-03-21T00:00:00Z?offsetMark=*&pageSize=100&api_key=DEMO_KEY

or are you using the xml or json endpoints for bulkdata

https://www.govinfo.gov/bulkdata/xml https://www.govinfo.gov/bulkdata/json

The link you provided suggests the latter or at least that you are pulling the ZIP files for Congress/bill types on a recurring basis from the bulkdata repository.

The example xml file you referenced is a BILLSUM file, not a BILLSTATUS xml file. I want to better understand the specific issue you are seeing so we can troubleshoot.

In BILLSTATUS-118hr2670.xml, there is a relatedBills tag that lists a large number of relatedBills items. Are you saying that you are seeing items be removed from this list?

My initial guess is that there are changes on the upstream congress.gov API (the source for our BILLSTATUS and BILLSUM xml) that may be causing changes in the resulting xml.

prajwal19988 commented 8 months ago

yes, we are finding large number of related bills related changes in the billsum file...lot of items in related bills are being removed and few others being added on a daily basis...

prajwal19988 commented 8 months ago

Hi @jonquandt , the crux of the issue is that on Jan 25, for Bill HR 2670 was added with an entry in related bills : H.R. 6056 . The peculiarity is that both these bills did not have any recent updates. It was introduced in October of last year-2023.

for example -
HR-3746 : The newer update does not have related bills of older version.

Can you please take a look and let us know if this is the expected behaviour, if not can this be rectified in the future ?