Closed mvidalgarcia closed 2 years ago
First some meta-notes on this epic-like issue:
We can therefore target an MVP here, which would be mostly gitignore-reanaignore-related tasks:
.gitignore
..reanaignore
that would behave like .gitingnore
would be good to have too, because we did saw analysis repositories with TeX sources and slides etc, which don't necessarily need to be transferred to the workflow workspace. (But could, if workflow produces extra reports!)inputs.directories
and inputs.files
listed in reana.yaml
except for files appearing in .gitignore
.However, there is one drawback. It can happen that users may have big *.root
files lying around that are to be seeded to the workflow for execution but that are in .gitignore
for obvious reasons. Example: D2PiMuMuOS.root
in reana-demo-lhcb-d2pimumu
. How to address this? One option would be to do some file recognition logic like:
reana.yaml
(=directly mentioned by name in inputs.files
), then upload it for sure;reana.yaml
, but only implicitly (=one of parent directories is mentioned in inputs.directories
), then don't upload it if it matches .gitignore
;reana.yaml
(=directly mentioned in inputs.directories
), then upload it for sure;reana.yaml
(=one of its parents is listed in inputs.directories
), then don't upload it if it matches .gitignore
.IOW, the reana.yaml
directives about inputs.files
and inputs.directories
play a role of .reanainclude
, so to speak.
As to whether we upload "everything in the current directory" by default, that would be a bigger change to the current UX model that we can decide about later. The MVP concern is to recognise .gitignore
so as not to transfer __pycache__
and *.pyc
and other unwanted files which often prevent analysis from being started due to troubles when uploading numerous files.
I found a Python package that can help with .gitignore
parsing. Otherwise, it is a complicated process if we want to support different matching rules that .gitignore
provides.
The package does not have activity for a long time. it is quite minimalistic, so maybe it doesn't need much attention.
If the package works well and is safe enough, we can surely depend on it even though it is a bit inactive. Alternatively you could just launch git ls-files
etc as external processes, and match against that. (Might be handy also with respect to possibly moving the client from Python to Golang.)
Alternatively you could just launch git ls-files etc as external processes, and match against that.
This also sounds nice. One big difference between matching against .gitignore
or git ls-files
is that when matching against git ls-files
files that are not committed will be skipped.
Update: found -o
option but it looks like it will mix uncommitted and ignored files.
Having an extra .reanaignore that would behave like .gitingnore would be good to have too, because we did saw analysis repositories with TeX sources and slides etc, which don't necessarily need to be transferred to the workflow workspace. (But could, if workflow produces extra reports!)
I have a question. How .reanaignore
will behave? I assume:
.gitignore
.gitignore
, files listed in .reanaignore
will be preserved in git history but not uploaded to REANA?Thanks.
I have a question. How .reanaignore will behave? I assume:
Yes, exactly!
Yes, exactly!
Cool. I will start working on it right now.
Due to moving the workflow load logic from reana-client to the workflow-engine, we need all the workflow specification files in the workspace to be able to load the specification.
Current behavior
When running
reana-client upload -w foo
we only upload the files specified ininputs.files
and we have to manually upload what's inworkflow.file
and its dependencies.(E.g. when developing a workflow, when a user amends the code for some step as well as the workflow instructions at the same time, which is a frequent pattern, currently the user would have to upload modified workflow files manually.)
Expected behavior
When running
reana-client upload -w foo
, upload all the files in the currentreana.yaml
directory. (Notably the files that are under source code management.)Extra considerations
reana-client create
and keep only "light" YAML syntax checks?reana-client start
and warn the user if there're differences? (E.g. keep local client state in.reana
directory.).reanaignore
. First check.gitignore
, after that check.reanaignore
, and prevent uploading the present files. (Useful when source code contains also LaTeX sources for notes or presentations or other such files not necessary for execution.).git
folder, but upload git commit sha1 and branch information, for tracking purposes.upload
command does not make much sense, since people can use other mechanisms to access and modify files on their workspace.download
command not to overwrite anything locally present; we should check and if a target file is present, we should just warn the user and not overwrite anything. (So that nextupload
would be "clean".)