jupyterhub / systemdspawner

Spawn JupyterHub single-user notebook servers with systemd
BSD 3-Clause "New" or "Revised" License
92 stars 45 forks source link

Don't kill the whole service when a single proces OOMs #101

Closed dragz closed 1 year ago

dragz commented 1 year ago

This fixes a bug in systemd which terminate all processes in the scope if a process hits the mem_limit.

It seems to be fixed upstream, but might take a while until the distros implements it. https://github.com/systemd/systemd/issues/25376

welcome[bot] commented 1 year ago

Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly. welcome You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

behrmann commented 1 year ago

I've only glanced the linked issue, but on first look that was about the user session. systemdspawner units don't run inside a user session and I think it is by design, that the service unit is taken down if one of its children (i.e. a part of the unit) goes over the configured memory limit.

dragz commented 1 year ago

I'm not a systemd expert at all, but the submitted patch fix the problem of oom killing the entire session. It might be that the fix should go elsewhere though.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Jörg Behrmann @.> Sent: Friday, December 23, 2022 1:19:22 PM To: jupyterhub/systemdspawner @.> Cc: Roy Dragseth @.>; Author @.> Subject: Re: [jupyterhub/systemdspawner] Prevent oom-killer to take down the whole singleuser-server (PR #101)

I've only glanced the linked issue, but on first look that was about the user session. systemdspawner units don't run inside a user session and I think it is by design, that the service unit is taken down if one of its children (i.e. a part of the unit) goes over the configured memory limit.

— Reply to this email directly, view it on GitHubhttps://github.com/jupyterhub/systemdspawner/pull/101#issuecomment-1363908199, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUYNV6TIXSYM6RFQTRAFA3WOWKEVANCNFSM6AAAAAATHTXIAY. You are receiving this because you authored the thread.Message ID: @.***>

behrmann commented 1 year ago

Ah, I think I see what you mean now. The idea would be, that since the singleuser-server is unlikely to go OOM, but it will generally be the kernels started by it, killing just the offending kernel, which Jupyter is able to handle, and leaving the singleuser-server running, should work. Quite a neat idea. I will have to test this out on my users after the Christmas break. :)

dragz commented 1 year ago

Yes, that is the idea. The PR message could have been clearer. Anyway, happy holidays.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Jörg Behrmann @.> Sent: Friday, December 23, 2022 4:51:05 PM To: jupyterhub/systemdspawner @.> Cc: Roy Dragseth @.>; Author @.> Subject: Re: [jupyterhub/systemdspawner] Prevent oom-killer to take down the whole singleuser-server (PR #101)

Ah, I think I see what you mean now. The idea would be, that since the singleuser-server is unlikely to go OOM, but it will generally be the kernels started by it, killing just the offending kernel, which Jupyter is able to handle, and leaving the singleuser-server running, should work. Quite a neat idea. I will have to test this out on my users after the Christmas break. :)

— Reply to this email directly, view it on GitHubhttps://github.com/jupyterhub/systemdspawner/pull/101#issuecomment-1364058854, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUYNV5VTQXAHWU63U2RESDWOXC6TANCNFSM6AAAAAATHTXIAY. You are receiving this because you authored the thread.Message ID: @.***>

yuvipanda commented 1 year ago

This is perfect, I didn't even know this was not the default behavior! I think it might be the default behavior in some distros? Regardless, explicitly stating this is good, and it looks like the property might be available for a while anyway.

yuvipanda commented 1 year ago

Can you make this into a properties.setdefault call, like in https://github.com/jupyterhub/systemdspawner/pull/101/files#diff-bd23127b48ede7fc341829a7fbcf5687517ceb2ad7e595308a013813bfc28c64R99, along with an associated comment on why we do this? This allows admins to override this if they so choose. Happy to merge after that.

dragz commented 1 year ago

Sure, see update to the PR.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Yuvi Panda @.> Sent: Tuesday, December 27, 2022 11:30:11 PM To: jupyterhub/systemdspawner @.> Cc: Roy Dragseth @.>; Author @.> Subject: Re: [jupyterhub/systemdspawner] Don't kill the whole service when a single proces OOMs (PR #101)

Can you make this into a properties.setdefault call, like in https://github.com/jupyterhub/systemdspawner/pull/101/files#diff-bd23127b48ede7fc341829a7fbcf5687517ceb2ad7e595308a013813bfc28c64R99, along with an associated comment on why we do this? This allows admins to override this if they so choose. Happy to merge after that.

— Reply to this email directly, view it on GitHubhttps://github.com/jupyterhub/systemdspawner/pull/101#issuecomment-1366221664, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUYNV77V2BV74FVLI2AU2LWPNUXHANCNFSM6AAAAAATHTXIAY. You are receiving this because you authored the thread.Message ID: @.***>

welcome[bot] commented 1 year ago

Congrats on your first merged pull request in this project! :tada: congrats Thank you for contributing, we are very proud of you! :heart:

yuvipanda commented 1 year ago

Thanks, @dragz!

yuvipanda commented 1 year ago

@dragz this was released in v0.17 today