scitran / core

RESTful API
https://scitran.github.io
MIT License
18 stars 18 forks source link

Fwd slash #1089

Open hkethi002 opened 6 years ago

hkethi002 commented 6 years ago

Fixes #1071

Changes

Note: Only applies to engine uploads

Breaking Changes

Review Checklist

kofalt commented 6 years ago

Can you explain your thoughts around the choice of escape character?

codecov-io commented 6 years ago

Codecov Report

Merging #1089 into generic-storage will increase coverage by <.01%. The diff coverage is 100%.

@@                 Coverage Diff                 @@
##           generic-storage    #1089      +/-   ##
===================================================
+ Coverage            91.05%   91.05%   +<.01%     
===================================================
  Files                   49       49              
  Lines                 7067     7068       +1     
===================================================
+ Hits                  6435     6436       +1     
  Misses                 632      632
hkethi002 commented 6 years ago

I'm not attached to it at all, I used because when I rename a file using the Finder on my mac, it uses a ':', but just shows it as a '/' in the Finder.

kofalt commented 6 years ago

Is there a need for that artificial separator? If we're supporting folders on containers now, the engine could send you the whole path.

I'd be worried about unintentionally triggering this. If nothing else we'd need to document it in the gear spec, but gears by design don't have this external concept so it'd be a little awkward.

hkethi002 commented 6 years ago

@kofalt Would it require significant changes to the engine to send the desired path in the metadata?

kofalt commented 6 years ago

I don't think so, but we'd want to tell the engine to turn that behavior on. Maybe we could add a parameter to the job ticket response that indicates support for the feature.

ehlertjd commented 6 years ago

We will have to update the resolver (on the client) to handle slashes in filenames. My suggestion would be to allow backslash escaped filenames, which means we would also have to allow escaped backslashes.

kofalt commented 6 years ago

I was about to agree, but that will get really awkward if there are folders on any non-leaf container. What would you say to the following:

  1. If there are no non-folder paths to walk, walk a folder path.
  2. If there are non-folder paths (another container name, or anything), walk that.
  3. In the second case, escaping the slash will walk the folder path.

Given an ambiguous non-leaf container A, with children containers & path-embedded files:

├── A
│   │
│   ├── B
│   │   ├── one.txt
│   │
│   ├── C
│   │   ├── two.txt
│   │
│   ├── three.txt
│   ├── B/four.txt
│   ├── D/five.txt

The following resolves could occur:

Input Array Succeeds
A ['A'] Y
A/three.txt ['A', 'three.txt'] Y
A/B/one.txt ['A', 'B', one.txt'] Y
A/B/four.txt ['A', 'B', 'four.txt'] N
A/B\/four.txt ['A', 'B\/four.txt'] Y
A/D/five.txt ['A', 'D', 'five.txt'] Y

This includes static non-folder paths. For example the string collections when that is added. Thoughts?

kofalt commented 6 years ago

We'll have enough information to know if a node is ambiguous when displaying it, so we can even output the escaped versions preemptively so copy & paste works.

ehlertjd commented 6 years ago

@kofalt I think that works. Instead of backslashes, we could also just allow segments to be quoted when referring to a file, e.g. A/"B/four.txt". But I agree that we should do the right thing when we can, without requiring any escaping.

I think in any scenario, we would unescape path segments by the time they reach the server.

kofalt commented 6 years ago

Yeah, we could do either or both. Maybe instead of sending (from my example) ['A', 'B\/four.txt'] to the API, we just send ['A', 'B/four.txt'], because splitting is the client's problem.

In command-line usage, stuff will probably be quoted because spaces. So it might be weird to say "group/weird name/a/'B/four.txt'". I would vote \/, less neat but easier to read. And keep that convention when interacting API <--> other tools.

kofalt commented 6 years ago

This is also weird for exports. Up till now scitran maps to a tree. I really am not a big fan of breaking that.

kofalt commented 6 years ago

What if, for tree-mapping purposes, files are always under files/, and containers cannot be named / labeled files? PTAL @nagem

nagem commented 6 years ago

What if, for tree-mapping purposes, files are always under files/, and containers cannot be named / labeled files? PTAL @nagem

It is the intention with the new download endpoint work that certain container names will be off limits, including files, analyses, projects, subjects, sessions, acquisitions. We could move some of that work up to support this change. Here is the a rough version of the hierarchy for the metadata download:

proj_label
├── proj_label.flywheel.json
├── ANALYSES
│   └── ana_label
│       ├── ana_label.flywheel.json
│       ├── INPUT
│       └── OUTPUT
├── FILES
│   ├── filename.ext
│   └── filename.ext.flywheel.io
└── SUBJECTS
    └── subj_label
        ├── subj_label.flywheel.json
        ├── ANALYSES
        ├── FILES
        └── SESSIONS
            └── sess_label
                ├── sess_label.flywheel.json
                ├── ANALYSES
                ├── FILES
                └── ACQUISITIONS
                    └── acq_label
                        ├── acq_label.flywheel.json
                        └── FILES
kofalt commented 6 years ago

Okay, the only difference I can see is that yours is a forcibly-unambiguous tree, which you'd need for a download generator. Does anyone object to that differing from the resolver tree?

nagem commented 6 years ago

I don't think they need to be exactly the same, but they could use similar features like explicitly calling out sub documents that deviate from the normal hierarchy like files and analyses.

kofalt commented 6 years ago

Perfect. Let's go with that.

hkethi002 commented 6 years ago

So I'm not sure if I understand this correctly, but with this case

 ___A
│   ├── B/four.txt

If the name of the file is 'B/four.txt' in the file form, the metadata would have a list ['A', 'B/four.txt'] as the filename?