rwth-i6 / sisyphus

A Workflow Manager in Python
Mozilla Public License 2.0
45 stars 24 forks source link

Possibility to set specific env vars for job #178

Closed albertz closed 8 months ago

albertz commented 8 months ago

I need a possibility to set specific env vars for one specific job.

If you use CLEANUP_ENVIRONMENT = True, then you can do sth like:

job._sis_environment.set(key, value)

However:

so this is not a good solution.

(But I still have seen exactly such code in the wild, e.g. in i6_experiments in compile_sctk. Also see https://github.com/rwth-i6/i6_experiments/issues/192.)

I want a generic solution with public API, which also work when CLEANUP_ENVIRONMENT = False, and also which does not use string.Template.substitute (or only optionally, via an explicit flag).

Any suggestions how to implement this?

One suggestion:

Ok? Or other suggestions?

Note, those env vars would not influence the hash. This is also the behavior I want for my current use case.

albertz commented 8 months ago

Another aspect:

Sometimes I want to edit those env vars locally in a job. E.g. a typical example is an env var like PYTORCH_CUDA_ALLOC_CONF where I sometimes need to play around with after it crashed with OOM, and then resubmit the job with a slightly changed setting.

This should be simple. Currently, as I understand, this is stored inside the pickled Job object, in job.save, so this is not simple. (Can I actually delete the job.save and it would simply recreate it?)

So, maybe this means, using job._sis_environment for this is not a good idea, and we should have a separate pure text file which is easily editable?

@michelwi @curufinwe @critias comments, suggestions?

critias commented 8 months ago

Sorry, that I missed to respond to this. job.save is updated if the job is submitted and the manager was restarted. So using Job._sis_environment should work. Adding something like putenv is a good idea, but given the other methods are named like set_rqmt, set_keep_value, or set_defaults I think set_env would be a better name.

critias commented 8 months ago

I forgot to respond about the os.environ[k] = string.Template(v).substitute(orig_env) issue. Side note for everyone that doesn't know what this line does: It basically emulates the bash $ variable replacement syntax. Meaning something like {"PATH": "foobar:$PATH"} works like export PATH=foobar:$PATH" in bash.

I get your point that this might be unexpected, but breaking this now could cause setups to fail. I doubt it's worth changing it, but documenting it better would be a good idea.

albertz commented 8 months ago

One reason I went with the startup hook is that it is more flexible and allows maybe to have other logic in there, e.g. easily to do sth like string.Template.substitute, but maybe other other things, and the user can easily edit it, without the need to restart the Sisyphus manager.

But maybe that flexibility and the easy possibility to edit it are not needed here, so better use the more simple and specialized solution? I updated the PR accordingly.

breaking [string.Template(v).substitute usage] now could cause setups to fail

I'm not breaking it. My current solution is completely orthogonal to the existing (optional) _sis_environment, where you still have that string.Template.substitute.

I did not fully went with the solution that I outlined here (as there was also no feedback) because I thought it would make the _sis_environment code more complex to cover what I actually want. The code in the PR is much simpler than what I initially suggested here.

critias commented 8 months ago

Is there anything you can't do due to string.Template.substitute? You should be able to escape $ with $$. I think that introducing a second way to update the environment which behave differently would causes even more confusion.

albertz commented 8 months ago

I just don't want the potential source for errors because I maybe forgot to escape a $. I also will definitely have forgotten that we even have such logic when I edit such file again in 6 months. I sometimes also programatically get some env vars from other external sources, so then I would need extra complexity to escape $ automatically, making it way too complicated.

critias commented 8 months ago

How about adding a flag to set_env and automatically escapes the given string if it's set?

albertz commented 8 months ago

Yea that would be another option, but I think all this is still more complex than my current solution (added 11 lines of code), and maybe still more prone to errors (you have extra logic for escaping there, can you be 100% sure that it's always correct?), and also maybe confusing (once I forget again about this substitute logic, or other people who don't know about this, will wonder why you need to escape anything here).

But I can live with that, if you prefer it this way.

critias commented 8 months ago

I really don't like having two structures which do basically the same thing. I can't remembering using the string.Template.substitute feature (@curufinwe implemented it if I recall correctly), but I think changing it now would cause more harm than good.

In the end I don't mind if you use set_vars_verbatim or value.replace('$', '$$'). I just think the later requires less code changes. Overall this logic is pretty much like you proposed in your first post.

And yes, the doc string should mention why it can be necessary to escape the input (or set it to verbatim) to avoid that people wonder why they might want to do it.

albertz commented 8 months ago

I went with the set_vars_verbatim because I want to avoid any potential further source for errors (the escaping logic here).

I added a new separate function set_verbatim to set it. This is the method used by Job.set_env, because I really think when this is used programmatically by the recipes, any user would expect that the env var is set verbatim to the string which is passed to set_env.

michelwi commented 8 months ago

I really don't like having two structures which do basically the same thing.

I mildly agree. How about

    def set(self, var, value=None, verbatim=False):

and then if verbatim is set, the necessary substitutions are done automatically? Or the other way round: we get rid of the template, but if verbatim=False there is some other code to recover the behavior.

albertz commented 8 months ago

How about...

What about the current implementation in the PR? Or you say you prefer a single method? The thing is, I really don't want to have verbatim=False as default, as I think this is unexpected. However, I also don't want to break anything here.

michelwi commented 8 months ago

What about the current implementation in the PR? Or you say you prefer a single method?

I have a slight preference for single method, but I can also sleep well with the current implementation.