reanahub / reana-client

REANA command-line client
http://reana-client.readthedocs.io/
MIT License
10 stars 46 forks source link

upload: upload full directory when files are not specified #406

Closed mvidalgarcia closed 2 years ago

mvidalgarcia commented 4 years ago

Due to moving the workflow load logic from reana-client to the workflow-engine, we need all the workflow specification files in the workspace to be able to load the specification.

Current behavior

When running reana-client upload -w foo we only upload the files specified in inputs.files and we have to manually upload what's in workflow.file and its dependencies.

(E.g. when developing a workflow, when a user amends the code for some step as well as the workflow instructions at the same time, which is a frequent pattern, currently the user would have to upload modified workflow files manually.)

Expected behavior

When running reana-client upload -w foo, upload all the files in the current reana.yaml directory. (Notably the files that are under source code management.)

Extra considerations

tiborsimko commented 2 years ago

First some meta-notes on this epic-like issue:

We can therefore target an MVP here, which would be mostly gitignore-reanaignore-related tasks:

However, there is one drawback. It can happen that users may have big *.root files lying around that are to be seeded to the workflow for execution but that are in .gitignore for obvious reasons. Example: D2PiMuMuOS.root in reana-demo-lhcb-d2pimumu. How to address this? One option would be to do some file recognition logic like:

IOW, the reana.yaml directives about inputs.files and inputs.directories play a role of .reanainclude, so to speak.

As to whether we upload "everything in the current directory" by default, that would be a bigger change to the current UX model that we can decide about later. The MVP concern is to recognise .gitignore so as not to transfer __pycache__ and *.pyc and other unwanted files which often prevent analysis from being started due to troubles when uploading numerous files.

VMois commented 2 years ago

I found a Python package that can help with .gitignore parsing. Otherwise, it is a complicated process if we want to support different matching rules that .gitignore provides.

The package does not have activity for a long time. it is quite minimalistic, so maybe it doesn't need much attention.

tiborsimko commented 2 years ago

If the package works well and is safe enough, we can surely depend on it even though it is a bit inactive. Alternatively you could just launch git ls-files etc as external processes, and match against that. (Might be handy also with respect to possibly moving the client from Python to Golang.)

VMois commented 2 years ago

Alternatively you could just launch git ls-files etc as external processes, and match against that.

This also sounds nice. One big difference between matching against .gitignore or git ls-files is that when matching against git ls-files files that are not committed will be skipped.

Update: found -o option but it looks like it will mix uncommitted and ignored files.

VMois commented 2 years ago

Having an extra .reanaignore that would behave like .gitingnore would be good to have too, because we did saw analysis repositories with TeX sources and slides etc, which don't necessarily need to be transferred to the workflow workspace. (But could, if workflow produces extra reports!)

I have a question. How .reanaignore will behave? I assume:

Thanks.

tiborsimko commented 2 years ago

I have a question. How .reanaignore will behave? I assume:

Yes, exactly!

VMois commented 2 years ago

Yes, exactly!

Cool. I will start working on it right now.