multiscale / muscle3

The third major version of the MUltiScale Coupling Library and Environment
Apache License 2.0
25 stars 13 forks source link

Would it be possible to get environment variables in the YMMSL file? #246

Open DavidPCoster opened 1 year ago

DavidPCoster commented 1 year ago

Would it be possible to get environment variables in the YMMSL file?

It would be good to be able to have something like work_dir: ${HOME}/GIT/ets_paf/UQ in the YMMSL file.

LourensVeen commented 1 year ago

Ah, work dir. You're not the only one to want to be able to run in a work dir that's not inside the run dir. It's currently not really supported because it's problematic from a provenance and reproducibility perspective, you end up overwriting previous results, or possibly depending on them.

You can hack your way around it by using cd ${HOME}/GIT/ets_paf/UQ ; <command> as the command, as the programs are started inside of a shell. That also allows you to use environment variables.

There may be some changes in the future however with exactly how things are started, currently it's a login shell but that's not ideal. I think it may be best to make it configurable, but I need some support from QCG-PJ for that. There's #201 for this.

DavidPCoster commented 1 year ago

Going into the work dirs would be fine -- however I seem to be in my home directory instead ...

LourensVeen commented 1 year ago

Ah, that's interesting, because you're the second person to say that in a few days.

QCG-PilotJob starts the models in a login shell (bash -l), which means that it loads /etc/profile and ~/.profile and various other configuration files, usually including ~/.bashrc. It doesn't seem like a login shell necessarily starts in your home directory though, if I run bash -l -c pwd on my laptop it will print whichever directory I'm in.

Perhaps you could try that, and check if there's a cd command somewhere in a start-up file?

Some more thoughts on this topic are in https://github.com/multiscale/muscle3/issues/201

DavidPCoster commented 1 year ago

OK -- I found a cd lurking in a system file sourced by the local .bash_profile

After removing that, the workdir functionality works.

I would still like to be able to include environment variables in the YMMSL file, though ...

LourensVeen commented 1 year ago

Well, that should be possible. From which environment should they be taken though? The one the muscle_manager is started from? A clean (login) shell on the compute node? The laptop from which you ran ssh -c sbatch ...? And if the environment variable is used in the settings, how does that work with dynamically generated settings and muscle_settings_in? Or should this be limited to the implementations?

I'm trying to figure out the usage scenarios that people may have, because I think it would be easy to make something that doesn't do what at least some users expect. Having an executable location relative to ${HOME} makes sense, especially on clusters where the home dir is often not in /home/user or may be mounted in a different place on the head node and a compute node. Are there other things you would like to pull from the environment?

DavidPCoster commented 1 year ago

I suggest that the environment variables be those seen by muscle_manager ...

AreWeDreaming commented 1 year ago

I second this and would also like to have the ability to use environment variables. For example for the path to the applications. Hardcoding those is quite nasty.

AreWeDreaming commented 11 months ago

Maybe expanding on this a little. For OMFIT, which is a tool to connect different codes for workflow design, we initially started with using hardcoded path names for the various codes. However, now we are trying to move towards using environment variables that are specified by [LMOD])(https://lmod.readthedocs.io/en/latest/) modules. The benefit is that it moves the cluster specific components into the modules instead of having them duplicated in OMFIT.

Another argument is that setting up relative paths in the configuration file seems unnecessarily annoying. Having to iteratively try stacking "../" until the path specifications are finally correct is not ideal and can entirely be avoided, if the best practice of defining environment variables for code executables is introduced.