rackslab / Slurm-web

Open source web dashboard for Slurm HPC clusters
https://slurm-web.com
GNU General Public License v3.0
317 stars 89 forks source link

KeyError: 'getpwuid(): uid not found: xxxx #203

Closed mneilly closed 5 years ago

mneilly commented 5 years ago

I have slurm-web w/ slurm 18.08.4 up and running in a docker container but am running into the following error:

[Mon Feb 11 19:04:02.068163 2019] [wsgi:error] [pid 37:tid 139741657466624] ERROR:slurmrestapi:Exception on /jobs [POST]
[Mon Feb 11 19:04:02.068187 2019] [wsgi:error] [pid 37:tid 139741657466624] Traceback (most recent call last):
[Mon Feb 11 19:04:02.068189 2019] [wsgi:error] [pid 37:tid 139741657466624]   File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1817, in wsgi_app
[Mon Feb 11 19:04:02.068191 2019] [wsgi:error] [pid 37:tid 139741657466624]     response = self.full_dispatch_request()
[Mon Feb 11 19:04:02.068193 2019] [wsgi:error] [pid 37:tid 139741657466624]   File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1477, in full_dispatch_request
[Mon Feb 11 19:04:02.068195 2019] [wsgi:error] [pid 37:tid 139741657466624]     rv = self.handle_user_exception(e)
[Mon Feb 11 19:04:02.068197 2019] [wsgi:error] [pid 37:tid 139741657466624]   File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1381, in handle_user_exception
[Mon Feb 11 19:04:02.068199 2019] [wsgi:error] [pid 37:tid 139741657466624]     reraise(exc_type, exc_value, tb)
[Mon Feb 11 19:04:02.068200 2019] [wsgi:error] [pid 37:tid 139741657466624]   File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1475, in full_dispatch_request
[Mon Feb 11 19:04:02.068202 2019] [wsgi:error] [pid 37:tid 139741657466624]     rv = self.dispatch_request()
[Mon Feb 11 19:04:02.068204 2019] [wsgi:error] [pid 37:tid 139741657466624]   File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1461, in dispatch_request
[Mon Feb 11 19:04:02.068206 2019] [wsgi:error] [pid 37:tid 139741657466624]     return self.view_functions[rule.endpoint](**req.view_args)
[Mon Feb 11 19:04:02.068208 2019] [wsgi:error] [pid 37:tid 139741657466624]   File "/usr/share/slurm-web/restapi/cors.py", line 53, in wrapped_function
[Mon Feb 11 19:04:02.068210 2019] [wsgi:error] [pid 37:tid 139741657466624]     resp = make_response(f(*args, **kwargs))
[Mon Feb 11 19:04:02.068211 2019] [wsgi:error] [pid 37:tid 139741657466624]   File "/usr/share/slurm-web/restapi/auth.py", line 207, in inner
[Mon Feb 11 19:04:02.068213 2019] [wsgi:error] [pid 37:tid 139741657466624]     resp = f(*args, **kwargs)
[Mon Feb 11 19:04:02.068215 2019] [wsgi:error] [pid 37:tid 139741657466624]   File "/usr/share/slurm-web/restapi/cache.py", line 107, in inner
[Mon Feb 11 19:04:02.068217 2019] [wsgi:error] [pid 37:tid 139741657466624]     resp = f(*args, **kwargs)
[Mon Feb 11 19:04:02.068226 2019] [wsgi:error] [pid 37:tid 139741657466624]   File "/usr/share/slurm-web/restapi/slurmrestapi.py", line 120, in get_jobs
[Mon Feb 11 19:04:02.068228 2019] [wsgi:error] [pid 37:tid 139741657466624]     fill_job_user(job)
[Mon Feb 11 19:04:02.068230 2019] [wsgi:error] [pid 37:tid 139741657466624]   File "/usr/share/slurm-web/restapi/slurmrestapi.py", line 372, in fill_job_user
[Mon Feb 11 19:04:02.068231 2019] [wsgi:error] [pid 37:tid 139741657466624]     pw = pwd.getpwuid(uid)
[Mon Feb 11 19:04:02.068233 2019] [wsgi:error] [pid 37:tid 139741657466624] KeyError: 'getpwuid(): uid not found: 5059'

I'm assuming this is LDAP related since it is complaining about not knowing various uids for running jobs. I have the following in restapi.conf.

[ldap]
uri = ldap://ldap-server.xxxx.com:389
base = dc=xxxx,dc=com
ugroup = cn=users
mehdid commented 5 years ago

It looks related to LDAP indeed. Slurm-Web does something very basic at this stage. Did you try to test LDAP credentials from within a Docker container? (but independently from Slurm-Web). I suspect it doesn't work either.

mneilly commented 5 years ago

Thanks. You are correct. LDAP isn't working for the container in general so I need to start there. You can close this report as user error. :)

mehdid commented 5 years ago

Thanks for your feedback. I am closing this issue now.