reanahub / reana-client

REANA command-line client
http://reana-client.readthedocs.io/
MIT License
10 stars 46 forks source link

cli: improve `ls` file name filtering #521

Closed tiborsimko closed 3 years ago

tiborsimko commented 3 years ago

In the workflow list command, we allow workflow name filtering by matching parts of the name:

$ reana-client list               
NAME                           RUN_NUMBER   CREATED               STARTED               ENDED                 STATUS  
hello                          1            2021-06-02T12:07:28   2021-06-02T12:07:41   2021-06-02T12:07:48   finished
workflow                       1            2021-06-02T12:07:20   2021-06-02T12:07:33   2021-06-02T12:07:39   finished
yad                            1            2021-06-02T11:46:19   2021-06-02T11:46:32   2021-06-02T11:46:43   failed  
helloworld-serial-kubernetes   1            2021-06-02T11:44:55   2021-06-02T11:45:08   2021-06-02T11:45:15   finished
helloworld-yadage-kubernetes   1            2021-06-02T11:44:16   2021-06-02T11:44:30   2021-06-02T11:44:52   finished
helloworld-cwl-kubernetes      1            2021-06-02T11:43:47   2021-06-02T11:44:03   2021-06-02T11:44:11   finished

$ reana-client list --filter name=h              
NAME                           RUN_NUMBER   CREATED               STARTED               ENDED                 STATUS  
hello                          1            2021-06-02T12:07:28   2021-06-02T12:07:41   2021-06-02T12:07:48   finished
helloworld-serial-kubernetes   1            2021-06-02T11:44:55   2021-06-02T11:45:08   2021-06-02T11:45:15   finished
helloworld-yadage-kubernetes   1            2021-06-02T11:44:16   2021-06-02T11:44:30   2021-06-02T11:44:52   finished
helloworld-cwl-kubernetes      1            2021-06-02T11:43:47   2021-06-02T11:44:03   2021-06-02T11:44:11   finished 

In the workspace ls command, we require exact match:

$ reana-client ls -w hello         
NAME                    SIZE   LAST-MODIFIED      
code/helloworld.py      3253   2021-06-02T12:07:28
data/names.txt          20     2021-06-02T12:07:28
results/greetings.txt   34     2021-06-02T12:07:43

$ reana-client ls -w hello names                
NAME   SIZE   LAST-MODIFIED

It may be good to harmonise the behaviour so that filtering would behave similarly between commands.

Open question: do we abandon ls arg1 technique in profit of ls --filter name=arg1 to make the coherency even more explicit? This would allow to also filter (later) on other things than just names, for example ls --filter last-modified=2021-06-02.

P.S. Note also bug #520 that hits even for exact file names on master:

$ reana-client ls -w hello results/greetings.txt
Something went wrong while retrieving file list for workflow hello:
34 is not of type 'object'

Failed validating 'type' in schema['properties']['items']['items']['properties']['size']:
    {'properties': {'human_readable': {'type': 'string'},
                    'raw': {'type': 'number'}},
     'type': 'object'}

On instance['items'][0]['size']:
    34
ParthS007 commented 3 years ago

Yes, I think ls --filter name=arg1 is good for more coherency across commands. I would go for it 👍

tiborsimko commented 3 years ago

On one hand the consistency with list etc is nice, on the other hand the --filter option may look weird for rm command, should we want to push it there as well? (ls looks like a filtering command, but rm would probably not.) Currently, all file-related commands ls/mv/du/rm/upload/download behave similarly in a glob-friendly manner... So dunno, hence I tagged it as an "open question"...

For example, if we advertise zsh-style globbing better, then such a filtering might be more natural?

$ reana-client ls -w helloworld-cwl-kubernetes          
NAME                                                                        SIZE   LAST-MODIFIED      
workflow.json                                                               1287   2021-06-07T08:31:37
inputs.json                                                                 145    2021-06-07T08:31:37
code/helloworld.py                                                          3253   2021-06-07T08:31:22
data/names.txt                                                              20     2021-06-07T08:31:23
outputs/greetings.txt                                                       34     2021-06-07T08:31:40
workflow/cwl/helloworld-job.yml                                             122    2021-06-07T08:31:23
workflow/cwl/helloworld.cwl                                                 867    2021-06-07T08:31:23
workflow/cwl/helloworld.tool                                                541    2021-06-07T08:31:23
workflow/cwl/helloworld-slurmcern.cwl                                       965    2021-06-07T08:31:23
workflow/cwl/helloworld-htcondorcern.cwl                                    1007   2021-06-07T08:31:24
cwl/docker_stagedir/stg8722d6ef-6bbb-45d2-a19e-ac7e95bf111a/helloworld.py   3253   2021-06-07T08:31:22
cwl/docker_stagedir/stge37e366e-5eda-4657-ba52-d2738374cddd/names.txt       20     2021-06-07T08:31:23
cwl/docker_outdir/results/greetings.txt                                     34     2021-06-07T08:31:40

$ reana-client ls -w helloworld-cwl-kubernetes '*/*.tool'
NAME   SIZE   LAST-MODIFIED

$ $ reana-client ls -w helloworld-cwl-kubernetes '**/*.tool'
NAME                           SIZE   LAST-MODIFIED      
workflow/cwl/helloworld.tool   541    2021-06-07T08:31:23

This might be very natural user experience for zsh users already... Definitely more than filtering...

So I see pros and cons with each approach. (A third solution could be to offer both?)

ParthS007 commented 3 years ago

Yes, I think zsh-style globbing would be more natural than --filter with ls command but I think it would be better to have more opinions about this.

Pinging @mvidalgarcia @audrium , what do you think?

tiborsimko commented 3 years ago

Filtering by size (and by last-modified) would be definitely a plus on top of zsh-style globbing. And would provide a great analogy with list --filter somefield=somesubvalue commands that users will get used to. Hence I thought about that 3rd way, combining the two together, i.e. keeping the current file globbing behaviour but also dding an optional --filter on top as well... Which would allow (in the future) to do things like:

$ reana-client ls -w myanalysis.42 '**/*.root' --filter size=0
$ reana-client ls -w myanalysis.42 --filter last-modified=2021-06-07

but also later possible evolution:

$ reana-client ls -w myanalysis.42 '**/*.root' --filter 'size>10GiB'
$ reana-client ls -w myanalysis.42 --filter last-modified=yesterday
audrium commented 3 years ago

+1 for combination of the two options. I think zsh-style globbing is a more user friendly way to easily filter something (less verbose), but --filter option provides the advantage to specify the field name.. So I guess the combination of the two would make the r-client really powerful

mvidalgarcia commented 3 years ago

I agree with the fact that all file-related commands ls/mv/du/rm/upload/download should behave similarly in a glob-friendly manner. Having --filter on top might be nice, but in any case, it would be possible to achieve the same results with Unix pipe commands, so I don't see it as something very urgent.