Closed anuraggulati closed 4 years ago
The relevant PDF is located here: https://api.govinfo.gov/packages/FR-2020-04-17/granules/2020-04405/pdf?api_key=DEMO_KEY Direct link to page
One possible way to grab the images would be to look at Federalregister.gov, which takes govinfo FR XML and provides some additional enhancements. here's the relevant page that shows the image. https://www.federalregister.gov/d/2020-04405/page-21566 https://s3.amazonaws.com/images.federalregister.gov/EP17AP20.000/original.png
You may want to play with the federalregister.gov API. Here's a call that provides a list of images for that document: https://www.federalregister.gov/api/v1/documents/2020-04405.json?fields%5B%5D=images
Version with some additional fields: https://www.federalregister.gov/api/v1/documents/2020-04405.json?fields%5B%5D=abstract&fields%5B%5D=document_number&fields%5B%5D=images&fields%5B%5D=title
Thank you kindly! Believe the first link gives me the entire pdf.
I didn’t know federal register had an API of its own - that’s neat. I will give that a go. Many thanks Jon!
On Tue, 11 Aug 2020 at 11:34 pm, Jon Quandt notifications@github.com wrote:
The relevant PDF is located here:
https://api.govinfo.gov/packages/FR-2020-04-17/granules/2020-04405/pdf?api_key=DEMO_KEY Direct link to page https://api.govinfo.gov/packages/FR-2020-04-17/granules/2020-04405/pdf?api_key=DEMO_KEY#page=51
One possible way to grab the images would be to look at Federalregister.gov, which takes govinfo FR XML and provides some additional enhancements. here's the relevant page that shows the image. https://www.federalregister.gov/d/2020-04405/page-21566
https://s3.amazonaws.com/images.federalregister.gov/EP17AP20.000/original.png
You may want to play with the federalregister.gov API https://www.federalregister.gov/developers/documentation/api/v1. Here's a call that provides a list of images for that document:
https://www.federalregister.gov/api/v1/documents/2020-04405.json?fields%5B%5D=images
Version with some additional fields:
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/usgpo/bulk-data/issues/62#issuecomment-672145706, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBUZ5R3BZBTZGHDO6FSUZDSAGB3FANCNFSM4P3GYK2Q .
PS: Any chance the image (appendix C) is available in an xml table or an html table or structured text perhaps and not an image?
You know, just the like the final version of part 43 (so not the proposed rule, the rule that’s live, published in FR on 30 July 2013) has the appendix data in a nice xml table.
Thanks!
Not that I am aware of.
Thank you Jon. Appreciate it. I will try OCR'ing it using Amazon Textract, that comes with AWS - I see the federalgov link opened the .png on AWS so that could be a good and relatively straight-forward way of exposing data
Just tried OCR (AmazonTextract) on https://s3.amazonaws.com/images.federalregister.gov/EP17AP20.000/original.png?1586977513 The first three columns were great - but this didn;t work for other columns (with the vertical text and "tick marks") Anyhow, this was helpful Jon. I am happy for you to close the issue
Many thanks!
If I recall correctly, the time between the proposed and final rule allows for the more complex typesetting needed to make those types of tables available (at least in some cases).
The federal register package FR-2020-04-17 has the text of the proposed part 43 rules (real time reporting) But the appendix states - [Please see PDF for image: EP17AP20.000] [Please see PDF for image: EP17AP20.001] etc
Where are these images please?