nomad-coe / nomad

NOMAD lets you manage and share your materials science data in a way that makes it truly useful to you, your group, and the community.
https://nomad-lab.eu
Apache License 2.0
71 stars 16 forks source link

Error: "cannot perform tokenauth" #88

Closed brands-d closed 11 months ago

brands-d commented 11 months ago

We started hosting a NOMAD Oasis Server (https://physikoasis.uni-graz.at/nomad-oasis/) using the central user management provided. However one of our users receives a "Could not validate credentials. The given token is invalid. (401)" error messages as a small red pop up at the bottom left and the server prints the following text when receiving the request.

This does not occur for other users and it doesn't even occur consistently everytime for this particular user (myself: dominik.brandstetter@uni-graz.at). Opening the "Publish->Dataset" entry is possible, however opening an entry there usually isn't. Sometimes clicking around the UI for a while allows the user to open other pages but no consistently working approach has been found. Uploading data using the UI or curl did not work flawlessly for this user but only after 2-3 attempts.

Error message:

nomad_oasis_app       | INFO:     127.0.0.1:35640 - "GET /-/health HTTP/1.1" 200 OK
nomad_oasis_app       | ERROR    nomad.infrastructure 2023-10-16T09:19:11 cannot perform tokenauth
nomad_oasis_app       |   - exception: Traceback (most recent call last):
nomad_oasis_app       |       File "/usr/local/lib/python3.9/site-packages/nomad/infrastructure.py", line 188, in decode_access_token
nomad_oasis_app       |         return jwt.decode(
nomad_oasis_app       |       File "/usr/local/lib/python3.9/site-packages/jwt/api_jwt.py", line 168, in decode
nomad_oasis_app       |         decoded = self.decode_complete(
nomad_oasis_app       |       File "/usr/local/lib/python3.9/site-packages/jwt/api_jwt.py", line 136, in decode_complete
nomad_oasis_app       |         self._validate_claims(
nomad_oasis_app       |       File "/usr/local/lib/python3.9/site-packages/jwt/api_jwt.py", line 193, in _validate_claims
nomad_oasis_app       |         self._validate_iat(payload, now, leeway)
nomad_oasis_app       |       File "/usr/local/lib/python3.9/site-packages/jwt/api_jwt.py", line 219, in _validate_iat
nomad_oasis_app       |         raise ImmatureSignatureError("The token is not yet valid (iat)")
nomad_oasis_app       |     jwt.exceptions.ImmatureSignatureError: The token is not yet valid (iat)
nomad_oasis_app       |     
nomad_oasis_app       |     During handling of the above exception, another exception occurred:
nomad_oasis_app       |     
nomad_oasis_app       |     Traceback (most recent call last):
nomad_oasis_app       |       File "/usr/local/lib/python3.9/site-packages/nomad/infrastructure.py", line 202, in tokenauth
nomad_oasis_app       |         payload = self.decode_access_token(access_token)
nomad_oasis_app       |       File "/usr/local/lib/python3.9/site-packages/nomad/infrastructure.py", line 192, in decode_access_token
nomad_oasis_app       |         raise KeycloakError('Could not validate credentials. The given token is invalid.')
nomad_oasis_app       |     nomad.infrastructure.KeycloakError: Could not validate credentials. The given token is invalid.
nomad_oasis_app       |   - exception_hash: fNj79vGPthgofJgC4hansQ5yiGl4
nomad_oasis_app       |   - nomad.commit: 
nomad_oasis_app       |   - nomad.deployment: Peters electronic structure theory group
nomad_oasis_app       |   - nomad.service: app
nomad_oasis_app       |   - nomad.version: 1.2.1
nomad_oasis_app       | INFO:     172.18.0.9:54638 - "POST /entries/query HTTP/1.0" 401 Unauthorized
nomad_oasis_proxy     | 143.50.77.109 - - [16/Oct/2023:09:19:11 +0000] "POST /nomad-oasis/api/v1/entries/query HTTP/1.1" 401 72 "http://physikoasis.uni-graz.at/nomad-oasis/gui/search/entries" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36" "-"
nomad_oasis_app       | INFO:     127.0.0.1:42560 - "GET /-/health HTTP/1.1" 200 OK
lauri-codes commented 11 months ago

Hi @brands-d!

Looking at your error, it seems like some sort of an issue with checking the time after which the token is valid.

After authentication, you will receive a special piece of signed information (JWT) that contains some details about the signed in person. This information also contains a timestamp, which indicates a point in time at which the information was issued at, called "iat". You are getting an error saying that somehow the server looking at the token thinks that it has not been issued yet.

The underlying reason might be a small time difference between the two different servers (the one creating the JWT, and the one that is validating it, these might very well be different physical machines) since it looks like you are getting errors quite randomly. I think the usual solution is to allow for a small 'leeway' in the checking of these timestamps that allows for a small time difference.

Let us investigate a bit if there is something we can do on our end to solve this.

brands-d commented 11 months ago

Thank you for the quick response!

Actually not necessary, your analysis was correct. A port on our server was closed preventing it from synchronzing the time on the host properly and we ran about 20s behind. Fixing this also resolved this issue. Thanks for your efforts and sorry as this was entirely on our end!

lauri-codes commented 11 months ago

Ok, great, thanks for the update!

ericpre commented 6 months ago

I had the same issue, after a while, it seems that the time from the host have got out of sync.

Would it be worth to add to the NOMAD Oasis setup instruction that time synchronisation needs to be enable on the server?

lauri-codes commented 6 months ago

Yes, we should add this to our documentation. What do you think would be the most logical place to put this in our current documentation structure? I would propose some kind of a note here: Provide and connect your own user management

ericpre commented 6 months ago

Since this would affect server using the centralised or own user management with / without docker, I would say that this would go best in a "pre-requisites" or maybe troubleshooting section, depending if this can be considered as a rate occurrence or not? I don't know how often this issue would occur and if this is something specific to my case - host is a rocky linux virtual machine that was setup by my IT department.

If people are missing it when setting the server, they should be able to find it quickly with searching the relevant keyword of the error message in the documentation.

lauri-codes commented 6 months ago

Thanks for the suggestion. I have now added a new "Troubleshooting" section to the OASIS install docs. After we update our deployments it will become part of the online docs.