wdonne / traefikoidc

BSD 2-Clause "Simplified" License
2 stars 2 forks source link

runtime data corruption #9

Open KampfCaspar opened 1 month ago

KampfCaspar commented 1 month ago

Hi!

With the current version 1.2.1 I reproducibly have the situation where a login works well immediately after (re)start, but after some hours, it only results in:

token signature is invalid: crypto/rsa: verification error

My config is rather minimal:

    plugin:
        oidcclient:
          encryptionSecretFile: /etc/traefik/encryption_secret.json
          idps:
            - name: default
              providerUrl: "https://XXXX/"
              postLogoutUrl: "https://XXXX/"
              clientSecretFile: /etc/traefik/oidc_secrets.json
              scopes:
                - openid
                - portal
wdonne commented 1 month ago

Hi @KampfCaspar ,

When the validation of the token fails, you should get a 401 if the call was an Ajax call. Otherwise, it should redirect to the IDP. Would you have some Traefik logs from around the moment this happens?

KampfCaspar commented 1 month ago

traefik.log

The traefik log seems quite normal... Note I get redirected to the zitadel login and the error appears upon the 'callback'.

For testing, I deleted all cookies and retried with the same result. The log for that: traefik2.log

wdonne commented 1 month ago

Then I don't understand the difference between the first login and the subsequent ones. The plugin is not aware of this. The only state it has is the tracking cookie, but that doesn't appear in the callback. I can add more logging to try to make sense of it.

KampfCaspar commented 1 month ago

That's why I suspected some sort of data corruption. If the key verification fails, might something corrupt the one part of the key pair?

wdonne commented 4 weeks ago

I have released v1.2.2, which has more error logging. It will also log the tokens that can't be consumed correctly, not the correct ones.

KampfCaspar commented 3 weeks ago

I installed 1.2.2 and got the result: traefik.log

wdonne commented 3 weeks ago

For one token I see the verification error, but when I copy it in the debugger on jwt.io it works fine. It is not even expired yet at the time of writing. I have no explanation for this.

KampfCaspar commented 3 weeks ago

That's not surprising, as the error literally occurs just after I get returned from zitadel. At that moment, I just got a new token.

Might the error message give a hint? It seems the verification fails within a rsa check. That's why I suspected the key material might have been corrupted (in RAM?).

wdonne commented 3 weeks ago

The token that follows the error message in the log is the one that caused the error. If there has been any corruption, then it should be visible in that string in the log. But since it works, I would conclude the token hasn’t been changed.

KampfCaspar commented 3 weeks ago

jwt.io also correctly verifies the signature with the key offered by my zitadel instance. Is there any way I can compare the key the plugin uses at that moment with the key from zitadel?

wdonne commented 3 weeks ago

I will have to add that to the log.

wdonne commented 3 weeks ago

Logging the public keys in a way you can compare is less trivial. In fact, when there is an error message during validation, then it is always about the last key in de the array the discovery URL returned. The keys are loaded once and never touched after. They are tried in the order they come out of the discovery.

KampfCaspar commented 3 weeks ago

There is only one key in the array; that key did not change.

The interesting issue is: immediately after a restart of traefik, eveything works fine. It works multiple times, too. However, after several hours, it fails.

I wanted to check therefore, if the plugin (for whatever reason) uses differing values immediately after the restart vs. several hours later. Might the loaded key somehow get corrupted?

This weekend we meet at the national hackspace reunion. I will try to learn go at least to the extend I can answer that question...

wdonne commented 2 weeks ago

I had this issue also with one of my IDPs. The reason is the rotation of the keys by the IDP. In release 1.2.3 the keys are reloaded if a key ID is offered that is not in the list.