jazzband / django-oauth-toolkit

OAuth2 goodies for the Djangonauts!
https://django-oauth-toolkit.readthedocs.io
Other
3.15k stars 794 forks source link

Non backwards compatible upgrade with token_checksum. #1495

Closed matejsp closed 1 month ago

matejsp commented 1 month ago

Describe the bug We have couple of servers in load balanced style running in 24/7 doing rolling upgrades almost every day without affecting the running application. Like any modern system. With upgrading to latest django-oauth-toolkit this is impossible.

There was upgrade to add token_checksum but it's value is required for working correctly.

When doing rolling upgrades you first update database with hashes generate. But because old instances are still running you are getting null values inserted. When you upgrade first instance it only accepts migrated access tokens and starts rejecting ones created after migration but before first instance upgrade with null values.

To Reproduce Have running django app with old django oauth toolkit version. Do migration. Generate new token with old version. Upgrade to latest version. Try using a token.

Problematic PR: https://github.com/jazzband/django-oauth-toolkit/pull/1447/files

Expected behavior I would expect modern library to cautious of upgrade plan and be backwards compatible in line with modern systems architecture and practices. For pet projects it's fine because you can take the whole application offline. Lot of production apps nowadays need to run 24/7 with planned downtime months in advance.

Options:

  1. fallback _load_access_token to query both hash and token value. select * from access_token where token_checksum='hash' or (token_checksum is None and token = 'token_val') or two selects so we don't feed jwt token to DB via wire if not necesery.
  2. configuration to switch from old select via token to new checksum style
  3. trigger on database to calc CREATE TRIGGER oauth2_provider_accesstoken_checksum BEFORE INSERT ON oauth2_provider_accesstoken FOR EACH ROW SET NEW.token_checksum = SHA2(NEW.token, 256);

Version 2.3.0 -> 3.0.1

Additional context

n2ygk commented 1 month ago

Feel free to submit a PR. However your use case of simultaneously using two different package versions against one database sounds strange. Perhaps I'm not understanding.

matejsp commented 1 month ago

Basically you have one application with database. And because you want to achieve high availability you have at least two servers with load balancer in front. If you want to avoid downtime you do rolling upgrade.

Steps:

Now in the process of rolling upgrade you have at steps with (*) where you run two versions of different application including package versions. And now without backwards compatibility of data model you cannot do rolling upgrades with zero down time.

Basically you need the procedure from previous decade: stop all server, upgrade db, start all server. And be offline for couple of minutes or even hours if you have large db.

Same issue for canary and blue/green deployments (with nice videos explaining where you have two versions deployed): https://octopus.com/devops/software-deployments/blue-green-deployment/ https://octopus.com/devops/software-deployments/canary-deployment/ https://octopus.com/devops/software-deployments/rolling-deployment/

n2ygk commented 1 month ago

Well, that is why it’s a labeled a major version breaking upgrade….

On Sat, Sep 14, 2024 at 8:04 AM Matej Spiller Muys @.***> wrote:

Basically you have one application with database. And because you want to achieve high availability you have at least two servers. If you want to avoid downtime you do rolling upgrade.

Steps:

  • upgrade db
  • upgrade server 1/3 (*)
  • upgrade server 2/3 (*)
  • upgrade server 3/3

Now in the progress of rolling upgrade you have at steps with (*) where you run two versions of different application including package versions. And now without backwards compatibility of data model you cannot do rolling upgrades with zero down time.

Basically you need the procedure from previous decade: stop all server, upgrade db, start all server. And be offline for couple of minutes or even hours if you have large db.

— Reply to this email directly, view it on GitHub https://github.com/jazzband/django-oauth-toolkit/issues/1495#issuecomment-2350967319, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBHS56X3CN3YRNDRWDYPVTZWQQ6BAVCNFSM6AAAAABOGX46LWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJQHE3DOMZRHE . You are receiving this because you commented.Message ID: @.***>

matejsp commented 1 month ago

Yes I am aware of that. Breaking changes are not the problem. It's how they are delivered. I adore Django way of maintaining compatibility even across major versions, when removing or breaking features.

I was thinking about how to patch latest version. But I prepared alternative plan for us to upgrade (in case anybody has the same issue to achieve zero downtime in HA environment):

Before upgrade copy code from new version:

class TokenChecksumField(models.CharField):
    def pre_save(self, model_instance, add):
        token = getattr(model_instance, "token")
        checksum = hashlib.sha256(token.encode("utf-8")).hexdigest()
        setattr(model_instance, self.attname, checksum)
        return super().pre_save(model_instance, add)

class BtsAccessToken(oauth2_provider_models.AbstractAccessToken):
    class Meta:
        swappable = 'OAUTH2_PROVIDER_ACCESS_TOKEN_MODEL'

    # temporary field
    token_checksum = TokenChecksumField(
        max_length=64,
        blank=False,
        unique=True,
        db_index=True,
    )

With old version introduce new access token model (in django settings).

OAUTH2_PROVIDER_ACCESS_TOKEN_MODEL = 'oauth2.BtsAccessToken'
  1. make migrations (just adding fields and indexes)
  2. we won't run migration to populate token_checksum (we have 24 mio records).
  3. deploy our app with patched old version
  4. wait 24 hours so all the access tokens are expired
  5. deploy our app with latest oauth version
  6. remove the temp hacks in code