kilork / openid

OpenID Connect Rust Library
The Unlicense
61 stars 22 forks source link

Receiving Decode(MissingKey(...)) error from Google after many hours of running #28

Open Restioson opened 2 years ago

Restioson commented 2 years ago

Hi, I've been using openid in my project for a while now (code for it is here) - thanks for the effort you've put into it! I have however run into one small issue, which is that after leaving the server running for a long time (4 days to a week of uptime, perhaps), Google begins returning Decode(MissingKey(...)) errors, so nobody can log in. Upon restarting the server, though, it immediately begins working again. Thus, I'm not really sure which part of the stack is at fault here - my code, Google, or the library. My first thought was maybe it could be this library (something to do with reusing/resuming connections perhaps?) If not, I wonder if you know where I could start investigating this problem or where it's likely to be?

kilork commented 2 years ago

I think I know this behaviour, I also have not investigated deep. It seems like metadata from provider has lifetime. Most simple solution would be just refresh client once per day for example.

Restioson commented 2 years ago

This metadata from the provider - is it tied to the session (client) that is being used to make requests?

kilork commented 2 years ago

Yes, it is in client. We do not fetch metadata each time, developers are free to write particular logic to handle this. I recently had a case, there I need to fetch some db description for provider in case of particular exception for particular provider.

But definitely I think it can be done better, because large providers like google are expected to work out of the box without future modifications.

Restioson commented 2 years ago

For reference, your solution worked! I made it refresh the client once every 24 hours, and I haven't heard any users say that the login isn't working, and nor have I experience it myself. It's a little less pretty, but it works alright :)

andrewbaxter commented 1 year ago

It would be great if the example showed doing this, assuming it's best practice. I usually rely on examples for idiomatic usage including things like this.

kilork commented 1 year ago

@andrewbaxter I do not know, it would be idiomatic only for google. Current example in top level readme is already quite complex. We probably should keep it simple. But more nice examples and building blocks means for me more simple usage for users. Maybe there is a need in separate library or feature to this library, which would have higher level building blocks for typical use-cases. This is why this bug is open :)

andrewbaxter commented 1 year ago

Are you saying that based on the above report or is there some documentation saying this only applies to google? There are many versioned endpoints for instance and configuration parameters (supported algorithms, etc). Okta docs for instance say that the .well-known endpoint output may be slow to update, implying that the values there can and do change over time.

The response has http caching headers. From Okta the duration is roughly 24h from the time of the request. I think it's unwise in the general case to assume the values are correct longer than that, even if they don't change often in practice.

If there are plans for the client to automatically update that would be great, but otherwise I do feel the example sets a dangerous precedent - the risk is a fairly critical outage, and the error is hard to diagnose without seeing this thread.

brianmay commented 1 year ago

I am seeing similar errors from Dex also.

brianmay commented 1 year ago

Would appreciate any help in solving this on Axum. I have no idea how to refresh an Extension value on a regular time based interval.

brianmay commented 1 year ago

OK, got a solution. Axum does not allow for mutable state values, so I used ArcSwap to create a value with interior mutability. https://github.com/brianmay/robotica-rust/blob/1a2a3e380a6cc419f3e335574729358f6f824f4f/robotica-backend/src/services/http/mod.rs#L122-L143, and I regularly replace the value once an hour.