usgpo / api

services to access govinfo content and metadata
https://api.govinfo.gov
Other
183 stars 58 forks source link

Getting Access Denied for images mentioned in cfr xml #67

Closed anuraggulati closed 3 years ago

anuraggulati commented 4 years ago

Hello, https://s3.amazonaws.com/images.federalregister.gov/EP17AP20.000/original.png works (one of the images in part 43 proposals of title 17 of CFR) https://s3.amazonaws.com/images.federalregister.gov/ER09JA12.005/original.png gives an error

What's a robust/consistent way to access images mention in federal-register xmls and Code of federal regulation xmls?

<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>9R1JBQATDZ1ZES2Y</RequestId>
<HostId>kP/nQ54YPizQ91B0i8I00Y6w4V3ct3zhyiHX7XPpep5Jdnle0goy+Tk9I/JXCpj5CkAkbTFp1Gw=</HostId>
</Error>
anuraggulati commented 4 years ago

Hi there @jonquandt - do you reckon this issue can be solved? I know it's only been a day, but any indication of how big/small this issue is would help... Is it just a case of updating permissions?

I have seen similar issues in the bulkdata repository....

Thanks, Anurag

jonquandt commented 4 years ago

@anuraggulati -- can you provide the links to the CFR and FR that are showing these image link references?

Based on what I'm seeing, it appears that these are coming from the federalregister.gov API or from ecfr.federalregister.gov. Govinfo doesn't use the s3.amazonaws.com domain.

Federalregister.gov and ecfr.federalregister.gov have used those links in the past, but I believe they are transitioning to use images.federalregister.gov instead.

ECFR image references on the govinfo bulkdata repository look like this (example from the latest ECFR-title43:

</HEAD>
<img src="http://www.ecfr.gov/graphics/er18de97.008.gif"/>
<img src="http://www.ecfr.gov/graphics/er18de97.009.gif"/>
</DIV9>

And graphics are available for each title within the ECFR-titleXX-graphics.zip file in the same directory.

Similarly, the image reference you have above in govinfo only appears in the following CFR XML (and its bulkdata equivalent) https://www.govinfo.gov/content/pkg/CFR-2020-title17-vol2/xml/CFR-2020-title17-vol2-part43-appA.xml

Here, the xml just instructs you to refer to the PDF.

If you are using the federalregister.gov api or ecfr.federalregister.gov to access these images, you may want to put in some feedback to those sites directly, as the govinfo team doesn't have direct knowledge of how those sites provide access to images, etc.

FederalRegister.gov: https://federalregister.tenderapp.com/discussion/new?discussion ecfr.federalregister.gov: click on the help button on the bottom right of the page.

https://github.com/usnationalarchives/federalregister-api-core/issues

anuraggulati commented 4 years ago

Thanks @jonquandt. I apologise I should have clarified that the XML don't refer to the amazonaws links, they refer to the PDF. However, based on your answer on https://github.com/usgpo/bulk-data/issues/62 I thought that amazonaws will always provide the image.

I don't use the eCFR because what is currently in-force I get from CFR and then if there are proposals/amendments to a CFR title/part/section that I am interested in, I get them from the FR. The reason being eCFR has a disclaimer I believe that states that there the official source is the FR and CFR (from the eCRFR site "those relying on it for legal research should verify their results against an official edition of the daily Federal Register")

A few things I would like to clear up:

  1. If the Fed Register has published an amendment which will be reflected as an updated/final rule next year in the CFR, will the eCFR give me an early preview of how the updated/final rule will look like?
  2. Am I correct in my understanding that the API/Bulk-XML of FR and CFR is 'more reliable' than the eCFR (per the disclaimer above)
  3. Because I use XML, I want to have a way to convert the PDF Image references to image links - Am I understanding correctly that if I construct an eCFR URL - I should be able to get an image please?

Thanks again Jon, you've been very helpful

anuraggulati commented 4 years ago

In addition, if you go here: https://ecfr.federalregister.gov/current/title-17/chapter-I/part-43

And then scroll to the appendix, right click on the image to get the address, you will get this image: https://s3.amazonaws.com/images.federalregister.gov/ER09JA12.008/original.gif .....

anuraggulati commented 4 years ago

Also, when https://www.govinfo.gov/content/pkg/CFR-2020-title17-vol2/xml/CFR-2020-title17-vol2-part43-appA.xml instructs to see the PDF, the PDF is not associated just with that one image but entire appendix A, correct?

I would have thought that if CFR bulk data/xml is referring to PDFs for images, then those would also be included in a graphics.zip type construct?

jonquandt commented 4 years ago

Thanks @jonquandt. I apologise I should have clarified that the XML don't refer to the amazonaws links, they refer to the PDF. However, based on your answer on usgpo/bulk-data#62 I thought that amazonaws will always provide the image.

Sorry for the confusion about that -- federalregister.gov uses the govinfo bulkdata, but performs additional enhancements that may make it easier to use for developer purposes. The federalregister.gov site is a joint project between NARA's Office of the Federal Register and GPO, so there is some level of integration between the two, with govinfo serving as the source. The suggestion to use the FR2.0 API was just an alternative to help address your question. For support on using that, going directly via the federalregister.gov site would be likely to provide you with more detailed assistance

I don't use the eCFR because what is currently in-force I get from CFR and then if there are proposals/amendments to a CFR title/part/section that I am interested in, I get them from the FR. The reason being eCFR has a disclaimer I believe that states that there the official source is the FR and CFR (from the eCRFR site "those relying on it for legal research should verify their results against an official edition of the daily Federal Register")

A few things I would like to clear up:

  1. If the Fed Register has published an amendment which will be reflected as an updated/final rule next year in the CFR, will the eCFR give me an early preview of how the updated/final rule will look like?

I am not an expert in the eCFR, but yes, that is the intent. It should incorporate FR final rule updates to show what is currently in force. The eCFR is an editorial compilation that is intended to provide a snapshot of current rules and regulations in their location within the CFR. CFR volumes are published annually (with a subset of the titles published each quarter) as a snapshot of regulations currently in place.

  1. Am I correct in my understanding that the API/Bulk-XML of FR and CFR is 'more reliable' than the eCFR (per the disclaimer above)

They have different purposes. For more information, see this helpful document on the ecfr.federalregister.gov site

  1. Because I use XML, I want to have a way to convert the PDF Image references to image links - Am I understanding correctly that if I construct an eCFR URL - I should be able to get an image please?

I'm not sure which eCFR URL you're constructing in the above question.

I would recommend reading more on the ecfr.federalregister.gov site to see if they have any additional information. If you want to use the govinfo bulkdata repository XML directly for images, I would refer you to section 2.5 of the ECFR XML user guide for more on how to use the graphics.zip files. Based on that, it appears that graphics are not always included.

You may wish to put in a request with the ecfr.federalregister.gov site team to help make those image references more easily available via their API. The site is currently in beta, so they are definitely interested in feedback.

Thanks again Jon, you've been very helpful

You're welcome.