usgpo / bulk-data

User Guides for XML on the govinfo Bulk Data Repository. For information about Bill Status XML Bulk Data, see https://github.com/usgpo/bill-status.
https://www.govinfo.gov/bulkdata
272 stars 100 forks source link

sitemapindex empty for bulkdata #37

Closed mooncalfskb closed 5 years ago

mooncalfskb commented 5 years ago

Hello. We are relying on this index page to determine what files to download but it is empty today. do you know what's up? https://www.govinfo.gov/sitemap/bulkdata/BILLSTATUS/sitemapindex.xml

The sub pages that were referenced on the sitemapindex appear to still be working: example: https://www.govinfo.gov/sitemap/bulkdata/BILLSTATUS/115s/sitemap.xml

thanks Sherrod

jonquandt commented 5 years ago

thanks for reporting this issue. I'll look into it and let you know when it's resolved.

mooncalfskb commented 5 years ago

thanks!

jonquandt commented 5 years ago

We've restored an older copy of the sitemaps. I will go ahead and republish a couple of billstatus for the most recently updated billtypes in the 115th and 116th Congress. that should at least update the last modified date on the index and let you know that you should crawl them. The regular billstatus job will also trigger updates.

Finally, coming soon -- access to BILLSTATUS and ECFR bulkdata from our API - https://github.com/usgpo/api/issues/4

mooncalfskb commented 5 years ago

Great!

JoshData commented 5 years ago

Any update here? https://www.govinfo.gov/sitemap/bulkdata/BILLSTATUS/115hr/sitemap.xml has been blank for a few days.

jonquandt commented 5 years ago

Let me look into that. I must have missed that one, though the others all had data in them. I’ll get back to you this morning

jonquandt commented 5 years ago

@JoshData - that sitemap is now restored. We're looking into how to prevent that from happening in the future. We set the update time to be this morning for all the billstatus files to ensure that no updates are missed. This does mean that there are now about 7300 billstatus packages reporting as new/updated.

JoshData commented 5 years ago

Thanks! I love that there are 7,300 billstatus packages to update. A few years ago there were zero! :)

jonquandt commented 5 years ago

reopening this issue to resolve the missing www.

jonquandt commented 5 years ago

@JoshData - We updated the sitemap for 115hr to include www. in the loc element.

jonquandt commented 5 years ago

Closed based on https://github.com/unitedstates/congress/issues/239 comments

JoshData commented 5 years ago

Thanks!

mooncalfskb commented 5 years ago

Thanks so much!