Closed rcthomas closed 3 years ago
Actually, I think it does need the environment settings. So these have to be able to be captured in the database in some way.
Ah, it's interesting that you bring this up now, as I've just been fighting with a similar thing. In our (recently revamped) setup, various spawner properties need to get set with data that arrives from the Authenticator
class. There's a whole separate issue there because the child spawner doesn't exist soon enough for the relevant hooks to fire. But once we worked around that, we found that our setup works unless you have to restart the Hub and restore the spawners from the DB, exactly because parts of the state weren't being persisted. Our solution does use callable environment
values, so possibly that's one reason why.
Good to know. The signs I see are the null
settings in the ~ORM~ DB, a 500 internal server error for /hub/api/users
if one of the servers is active, and an error in the logs about "function" not being JSON-serializable. Looked at it for a few hours today and I just see the simple solution of the fully realized environment needs to be cached on Spawner, then you have a hope of fixing it up in get_state()
in wrapspawner. But it's probably a bigger issue than that (with other possibly callable attributes)
I managed to provoke the behavior I alluded to before in my update. I started a server, shut down the hub, then kill -9
'ed my server, brought back up the hub, and then clicked "restart" on JupyterLab to try to revive my lost session. I was prompted to auth, then redirected back to my restarted server. That server now has wrong environment settings (namely "None") for things that are supposed to be determined via callable.
I think (note I am not 100% sure yet) that in the above scenario, get_env()
is being called by start()
as normal, but because the database has concretized/realized values for those callable-configured environment
settings, this part of get_env()
just copies them out of the database.
This may not always be the case, but in at least my case, I don't actually care if the callable()
gets the same value as was stored in the database. So in wrapspawner, get_state()
could just omit environment
and that would get me out of my pickle, but raises other questions and may not be an OK answer for everyone.
If instead we don't want get_env()
to come up with possibly different answers on restoration for environment
then storing the originally realized values has to be explored, and I am not sure that can be done without modifying get_env()
in JupyterHub to actually store the realized environment values. Or somehow be able to inject something into the child spawner get_env()
to do it for wrapspawner?
Hold up, now I'm confused about something.
I think (note I am not 100% sure yet) that in the above scenario, get_env() is being called by start() as normal, but because the database has concretized/realized values for those callable-configured environment settings, this part of get_env() just copies them out of the database.
I am not aware that environment values are actually persisted in the DB at all. I don't see those values show up when I inspect it in sqlite. And looking through user.py
and spawner.py
I don't see anywhere that the environment gets written out. Have you maybe customized something such that your spawner's environment is getting copied into the state object?
Yeah, my config has spawner-dependent environment settings. If you don't do that it's not a problem.
Note that the hub actually has a warning about this but it's hard coded into a context of user_options
[W 2021-01-27 09:37:52.846 JupyterHub orm:69](B Non-jsonable data in user_options: <class 'function'>; will persist None.
https://github.com/jupyterhub/jupyterhub/blob/master/jupyterhub/orm.py#L62
I think this is a non-issue (at least as filed). The reason is that while environment
and other things invite you to go ahead and define things as callable, you're still responsible for making things right in load_state
for those non-jsonables. Moving the callable in question into construct_child
(all my wrapped spawner classes need the same setting, so I can just attach it there) makes everything OK again (so far).
A spawner
environment
dictionary value is allowed to be either a string or a callable that returns a string. But if a wrapspawnerchild_config
contains anenvironment
dictionary where a value is a callable, thenget_state()
can't return a properly JSON-serializable dictionary. When the state is saved to the database, it seems that anull
is recorded. API calls that end up callingget_state()
and return spawner state, i.e./hub/api/users
crash because they can't turn a function into a string. It does not record the "realized" value, which makes sense; that is only "realized" whenget_env()
is called, and not recorded after that.If
environment
is allowed to be a dictionary with one or more callable values, then it just cannot be considered properly JSON-serializable, and would break the contract thatget_state()
return only JSON-serializable dictionaries. That is one problem. Another question though is whetherenvironment
qualifies as state that the Hub actually needs to maintain.If we decided
environment
could be dropped fromchild_conf
entries and API responses, that's probably not the end of it, since wrapped spawners can have all kinds of arguments that aren't serializable.