IN: Research if we can drop document proxy

showerst commented 3 years ago

Future note for when sessions die down a little:

I was looking at an Indiana bill and at least bill versions seem to be in a predictable format now --

http://iga.in.gov/legislative/2021/bills/senate/311#document-8d3b8e3f

The PDF url is http://iga.in.gov/static-documents/8/d/3/b/8d3b8e3f/SB0311.01.INTR.pdf

In the page source, even without js, 8d3b8e3f shows up a few times in attributes. An element also has data-docId="SB0311.01."

If we can hardcode the various INTR/ENR/etc codes, maybe we can construct a PDF url and do away with the proxy.

jamesturk commented 3 years ago

related to #291

jamesturk commented 3 years ago

Current status on this is that we now have a path that doesn't rely upon the proxy but Cloudflare is in the way of it. (It can be re-enabled with the env var INDIANA_DOCS_ENABLED.) If we ever get a reliable cloudflare workaround or exception, we can return to this.

johnseekins commented 1 year ago

We had to migrate the doc-proxy away from Heroku (as their free tier expired), but we also rely on that proxy for some CA processing. I don't think we want to shut the proxy down any time soon.

Side-note: Most IN scraping is now using their API instead of the direct site, so cloudflare is less of an issue).

Any reason we can't close this issue?

openstates / issues

IN: Research if we can drop document proxy #266