whole-tale / girder_wholetale

Girder plugin providing basic Whole Tale functionality
BSD 3-Clause "New" or "Revised" License
3 stars 5 forks source link

Allow to specify a subset of dataset during import via path #520

Closed Xarthisius closed 2 years ago

Xarthisius commented 2 years ago

Rationale

Often during registration of a binder/dataset as a Tale we're interested only in a subset of the imported tree. This PR introduces dsRootPath parameter to POST /tale/import that allows to specify internal dataset path that should be used for generation of Tale's dataSet.

How to test?

  1. Import a Tale via:
curl 'https://girder.local.wholetale.org/api/v1/tale/import?git=false&url=https%3A%2F%2Fpbcconsortium.s3.amazonaws.com%2Fwholetale%2F5ad7cdf55b0d5007601015b7ff1ea8d6%2F2021-11-09_21.47.58%2FDataset_1-882P.zip&taleKwargs=%7B%22title%22%3A%22foo%22%7D&spawn=false&lookupKwargs=%7B%7D&imageId=6207bb1ada801d652afb4245&asTale=false&dsRootPath=%2Fdata' \
  -X 'POST' \
  -H 'girder-token: <your_token>' 
  1. Inspect the resulting Tale. Data folder should contain assets and a bunch of csvs.
  2. (optionally) experiment with something that's not Deriva, e.g. Dataverse ds with hierarchy. Check that selecting subpath works for both asTale=True and asTale=False.
codecov[bot] commented 2 years ago

Codecov Report

Merging #520 (bcd24a0) into master (fd35a45) will increase coverage by 0.05%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #520      +/-   ##
==========================================
+ Coverage   91.78%   91.84%   +0.05%     
==========================================
  Files          58       58              
  Lines        4443     4450       +7     
==========================================
+ Hits         4078     4087       +9     
+ Misses        365      363       -2     
Impacted Files Coverage Δ
server/rest/tale.py 96.44% <100.00%> (ø)
server/tasks/import_binder.py 93.44% <100.00%> (+1.39%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fd35a45...bcd24a0. Read the comment docs.

craig-willis commented 2 years ago

I don't understand the motivating case here -- I expected https://github.com/whole-tale/girder_wholetale/pull/519 to require this PR. Why do we need to do this?

Xarthisius commented 2 years ago

I don't understand the motivating case here -- I expected #519 to require this PR. Why do we need to do this?

The motivating case is putting bdbag's data/ in our data/. That will require a few steps. This is the first one. I made it a generic solution, rather than provider specific. Remaining are:

  1. Change dashboard to accept dsRootPath for import path
  2. Make deriva provider use dsRootPath=/data in its integration by default.
craig-willis commented 2 years ago

OK, I get it now. So given the following dataset named "Test Dataset":

folder1/
    file2.txt
    folder2/
       file2.txt

For the following dsRootPath values I get:

I know the idea of the dataset name as a folder has annoyed some and can see why the DERIVA case of ../data/data/... might confuse.