usgpo / bulk-data

User Guides for XML on the govinfo Bulk Data Repository. For information about Bill Status XML Bulk Data, see https://github.com/usgpo/bill-status.
https://www.govinfo.gov/bulkdata
278 stars 102 forks source link

Where can I find the images mentioned in the appendix of New Title for 17 CFR Part 43? #62

Closed anuraggulati closed 4 years ago

anuraggulati commented 4 years ago

The federal register package FR-2020-04-17 has the text of the proposed part 43 rules (real time reporting) But the appendix states - [Please see PDF for image: EP17AP20.000] [Please see PDF for image: EP17AP20.001] etc

Where are these images please?

jonquandt commented 4 years ago

The relevant PDF is located here: https://api.govinfo.gov/packages/FR-2020-04-17/granules/2020-04405/pdf?api_key=DEMO_KEY Direct link to page

One possible way to grab the images would be to look at Federalregister.gov, which takes govinfo FR XML and provides some additional enhancements. here's the relevant page that shows the image. https://www.federalregister.gov/d/2020-04405/page-21566 https://s3.amazonaws.com/images.federalregister.gov/EP17AP20.000/original.png

You may want to play with the federalregister.gov API. Here's a call that provides a list of images for that document: https://www.federalregister.gov/api/v1/documents/2020-04405.json?fields%5B%5D=images

Version with some additional fields: https://www.federalregister.gov/api/v1/documents/2020-04405.json?fields%5B%5D=abstract&fields%5B%5D=document_number&fields%5B%5D=images&fields%5B%5D=title

anuraggulati commented 4 years ago

Thank you kindly! Believe the first link gives me the entire pdf.

I didn’t know federal register had an API of its own - that’s neat. I will give that a go. Many thanks Jon!

On Tue, 11 Aug 2020 at 11:34 pm, Jon Quandt notifications@github.com wrote:

The relevant PDF is located here:

https://api.govinfo.gov/packages/FR-2020-04-17/granules/2020-04405/pdf?api_key=DEMO_KEY Direct link to page https://api.govinfo.gov/packages/FR-2020-04-17/granules/2020-04405/pdf?api_key=DEMO_KEY#page=51

One possible way to grab the images would be to look at Federalregister.gov, which takes govinfo FR XML and provides some additional enhancements. here's the relevant page that shows the image. https://www.federalregister.gov/d/2020-04405/page-21566

https://s3.amazonaws.com/images.federalregister.gov/EP17AP20.000/original.png

You may want to play with the federalregister.gov API https://www.federalregister.gov/developers/documentation/api/v1. Here's a call that provides a list of images for that document:

https://www.federalregister.gov/api/v1/documents/2020-04405.json?fields%5B%5D=images

Version with some additional fields:

https://www.federalregister.gov/api/v1/documents/2020-04405.json?fields%5B%5D=abstract&fields%5B%5D=document_number&fields%5B%5D=images&fields%5B%5D=title

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/usgpo/bulk-data/issues/62#issuecomment-672145706, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBUZ5R3BZBTZGHDO6FSUZDSAGB3FANCNFSM4P3GYK2Q .

anuraggulati commented 4 years ago

PS: Any chance the image (appendix C) is available in an xml table or an html table or structured text perhaps and not an image?

You know, just the like the final version of part 43 (so not the proposed rule, the rule that’s live, published in FR on 30 July 2013) has the appendix data in a nice xml table.

Thanks!

jonquandt commented 4 years ago

Not that I am aware of.

anuraggulati commented 4 years ago

Thank you Jon. Appreciate it. I will try OCR'ing it using Amazon Textract, that comes with AWS - I see the federalgov link opened the .png on AWS so that could be a good and relatively straight-forward way of exposing data

anuraggulati commented 4 years ago

Just tried OCR (AmazonTextract) on https://s3.amazonaws.com/images.federalregister.gov/EP17AP20.000/original.png?1586977513 The first three columns were great - but this didn;t work for other columns (with the vertical text and "tick marks") Anyhow, this was helpful Jon. I am happy for you to close the issue

Many thanks!

jonquandt commented 4 years ago

If I recall correctly, the time between the proposed and final rule allows for the more complex typesetting needed to make those types of tables available (at least in some cases).