heroku / roadmap

This is the public roadmap for Salesforce Heroku services.
194 stars 0 forks source link

Releases: health-checks, auto-rollback, gradual rollout and a/b releases #139

Open friism opened 1 year ago

friism commented 1 year ago

Required Terms

What service(s) is this request for?

runtime

Tell us about what you're trying to solve. What challenges are you facing?

We should improve how code changes (releases) are rolled out with Heroku. We should consider adding:

For non-web dynos, we should also establish a healthcheck convention and support rolling deploys (currently non-web dynos don't support any form of gradual rollout)

trevorturk commented 1 year ago

I'm excited to see this on the roadmap!

I'm especially interested in gradual rollout and canary deploys. This sort of thing has been on my Heroku wishlist for a long time.

I'd also like to suggest considering an "adaptive preboot" for example using Rails recent addition https://github.com/rails/rails/pull/46936

If an app had a standard endpoint that could return 200 OK when everything is booted, we could make the zero downtime deploy via preboot much quicker, instead of waiting for a static 3 minutes. This could also be leveraged for auto-rollback, as in, don't switch over to the new code unless the health/heartbeat/up endpoint responds 200 OK.

Also worth mentioning is that I'd like to see a an option added to rollback which would bypass the preboot delay for emergency use.

Thanks!

stevenharman commented 1 year ago

re: Gradual Rollout.

I'd be happy just to see preboot get the boot, and instead see Common Runtime have a rolling restart like Private Spaces does. A cherry on top would be the ability to configure the percentage of the roll - it's hard-coded to 25% on Dogwood, IIRC. But in a large enough formation, it'd be nice to tune that down even further.

locofocos commented 1 year ago

I'd be excited to start with a limited version of this: a healthcheck endpoint + auto-rollback. There are cases where we have pushed code changes that caused our Rails application to fail to boot. A simple GET to a healthcheck endpoint would have returned a 500. I would love if heroku would make such a request to our new dynos during preboot, then halt the rest of the deploy if it can't get a 200 response.

nightpool commented 1 year ago

For canary deployments, having something gradual that would be controlled by the error rate of each release would be great.