dcl10 commented 1 year ago

Background

My team would like to use WfExS in Trusted Research Environment (TRE) which has data sources which can't be exposed to the outside world. We anticipate that the environment will contain variables which must be kept secret, i.e. not in the output RO-Crate). In some cases, some inputs may also be sensitive and we would like them not to be included in the output RO-Crate either.

Proposed Feature

For secret environment variables, would it be possible to add a section in the local config yaml file where we could put the variables as key-value pairs and then have WfExS load these into the local environment at runtime? Then during the creation of the RO-Crate, check for the secrets and exclude them from the crate and its metadata?

For secret inputs. would it be possible to add to the definition of an input a boolean flag to tell WfExS whether that input is secret or not? Then similarly to above have it excluded from the crate and its metadata.

jmfernandez commented 1 year ago

Hi! First of all, sorry for the delay. This message is a partial answer to your open issue, as the issue itself can be subdivided in more than one.

I have been analyzing the current scenario. Right now, WfExS-backend is passing to the workflow engines the environmental variables inherited from the process spawning the WfExS-backend instance. This can be unfortunate, because some local environment variable with special meaning either for the workflow engines or the workflows themselves is sent, unless that variable is needed by WfExS to properly setup the execution scenario using the matched workflow engine from the kind of workflow.on

So, one of the new features you need should be to be able to declare a custom block of environment variables in the workflow staging recipe yaml file, which might or might not have meaning to the workflow. These environment variables should not affect core variables, like PATH, TMPDIR, LD_LIBRARY_PATH, etc... These environment variables should be blended with the inherited parent environment variable.

Right now, the used environment variables are not included in the RO-Crate, and there is no provision to include them there, as you can see in Process Run Crate specification.

The approaches I'm proposing to keep this environment block secure in a proper way is either providing the block of environment variables on each execution in any working directory with a fully staged workflow scenario, or encrypting the environment block on workflow staging, and pass the decryption key each time (or when) the workflow is executed. This last approach fits very well with a feature I have in mind since a couple of years, which is creating the secure staged workflows working directories using either a random or the users keys, instead of the installation one.

jmfernandez commented 1 year ago

Since commit 3b9c71053ef20226b2fdfaf1f0b76df3461a14e9 WfExS has the capability to declare and send environment variables to workflow instantiations.

You can find a couple of examples at this Nextflow workflow instantiation example and this CWL workflow instantiation example.

It is still missing both the stage definition and implementation to be able to mask inputs and outputs integrated in the generated RO-Crate.

inab / WfExS-backend

Secrets/secret inputs #42

Background

Proposed Feature