reanahub / reana-client

REANA command-line client
http://reana-client.readthedocs.io/
MIT License
10 stars 45 forks source link

upload REANA specification file to server by default #623

Closed VMois closed 1 year ago

VMois commented 1 year ago

Originated in https://github.com/reanahub/reana-client/issues/620#issuecomment-1211638448 (point 1)

We do not upload the REANA specification files to the server. Having REANA specifications in the workspace will allow for easier debugs and more reproducibility (if we decide to export workflows to, for example, Zenodo).

The focus of this issue is to upload specifications by default in reana-client. This will help with transitioning to https://github.com/reanahub/reana-db/issues/162.

In addition, we can also add a validation step to reana-server to check if reana.yaml is uploaded.

VMois commented 1 year ago

There are two possibilities:

  1. Upload the specification file in the upload command. Two issues: we do not know what file was used for the REANA specification; if the specification file is invalid and big files have already been uploaded, it would be a waste of time as the user will need to restart workflow with a new spec.

  2. Upload the specification file in the create command right after creating the workflow. We can validate the spec before uploading big files. Two issues: more steps like additional validation, etc.; we will need to duplicate the same thing in restart.

WDYT? Any other issues we might have? @tiborsimko

VMois commented 1 year ago

A few more questions:

  1. What name should we use for the specification file?

For example, if a user executes reana-client create -w test -f reana-yadage.yaml are we saving the specification file as reana-yadage.yaml or reana.yaml? I would go with reana.yaml because we can only have one source of truth for specification.

  1. How uploading REANA specification will affect restart?

We will probably need to re-upload the specification file. Any other issues I miss?

tiborsimko commented 1 year ago

WRT 1-or-2: I think we can do 2, and think of uploading specs and "creating" workflow, whilst "upload" is reserved for uploading research workflow inputs.

WRT 3: Let's keep the original name, i.e. reana-yadage.yaml. This is what people would use if they included file implicitly:

$ cat reana-root.yaml
inputs:
  files:
    - reana-root.yaml

BTW note that we would need to upload all the workflow files referenced in that reana.yaml part for CWL/Snakemake/Yadage, for example if reana.yaml says:

workflow:
  type: yadage
  file: workflow/yadage/workflow.yaml

then would should upload workflow/yadage/workflow.yaml and files it may include.

(But for MVP this can be done later, when the "proper" validation is done on server side.)

WRT 4: if somebody modified steps.yaml before restarting a workflow, I think they would upload such a file manually. The restart functionality can be perhaps left aside for now; we'll consider reana.yaml and workflow files as unchanged by default, and we'll advise people to resubmit any changed files "manuall" before launching restart. It might do for an MVP.

However, you are right that this will lead to usage complications that we need to think about later, such as using CLI parameters and e.g. -p events=20000 for the original launch, and -p events=10000 for the restart. This may lead to non-trivial situations how to store what parameter had value for what step. We would need to recreate REANA workflow specs from changed values, or forbid using different parameters for simple restarts, etc. We could leave this part for later.)