Open showerst opened 3 years ago
related to #291
Current status on this is that we now have a path that doesn't rely upon the proxy but Cloudflare is in the way of it. (It can be re-enabled with the env var INDIANA_DOCS_ENABLED.) If we ever get a reliable cloudflare workaround or exception, we can return to this.
We had to migrate the doc-proxy away from Heroku (as their free tier expired), but we also rely on that proxy for some CA processing. I don't think we want to shut the proxy down any time soon.
Side-note: Most IN scraping is now using their API instead of the direct site, so cloudflare is less of an issue).
Any reason we can't close this issue?
Future note for when sessions die down a little:
I was looking at an Indiana bill and at least bill versions seem to be in a predictable format now --
http://iga.in.gov/legislative/2021/bills/senate/311#document-8d3b8e3f
The PDF url is http://iga.in.gov/static-documents/8/d/3/b/8d3b8e3f/SB0311.01.INTR.pdf
In the page source, even without js,
8d3b8e3f
shows up a few times in attributes. An element also hasdata-docId="SB0311.01."
If we can hardcode the various INTR/ENR/etc codes, maybe we can construct a PDF url and do away with the proxy.