leanprover / elan

The Lean version manager
Apache License 2.0
292 stars 34 forks source link

how about azure CDN instead of scraping HTML assets #84

Open lovettchris opened 1 year ago

lovettchris commented 1 year ago

I don't like the maintainability problem of scraping HTML and its ensuing interruptions to our pipelines (and end users) so I had a chat with github support folks about why their gh release list is rate limited at all and they said they cannot remove the rate limit from that one api, and suggested we do the following instead:

  1. during the release process we capture these json files:
    gh api --header "Accept: application/vnd.github.v3+json" --method GET /repos/leanprover/lean4/releases > lean4.json
    gh api --header "Accept: application/vnd.github.v3+json" --method GET /repos/leanprover/lean4-nightly/releases > lean4-nightly.json
  2. and push them to an azure blob store I've setup at https://leanassets.blob.core.windows.net/...
  3. Then change elan to get the json from these CDN URL's:
    https://leanprover-assets.azureedge.net/releases/lean4.json
    https://leanprover-assets.azureedge.net/releases/lean4-nightly.json

This is in an Azure content delivery system which has no rate limits and should have high availability around the planet - similar to the availability of the github web pages we are scraping, with the added advantage that the json format will not change. The ping time to leanprover-assets.azureedge.net is 5ms!

I would be happy to do the work, but just wanted to propose the idea first. Setting up the above CDN edge took about 2 seconds since I already had the blob store for sharing links to my videos.

The capturing of the json could also be done in a github triggered Azure Function if you don't want to include it in our build CI release scripts.