scipp / essreflectometry

Reflectometry data reduction for the European Spallation Source
https://scipp.github.io/essreflectometry/
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

fix: separate path and filename providers #25

Closed jokasimr closed 3 months ago

jokasimr commented 5 months ago

The old version did not allow loading local files into the workflow, everything had to go through pooch. With this change the user can directly specify a local file path or the name of a file that is in the pooch data repository.

jokasimr commented 5 months ago

Please have a look at how I solved a similar problem scipp/esssans#50. Not sure that is perfect, but the idea is that it would allow for adding things such as SciCat, which could convert a filename (or run ID) into a file path.

@jl-wynen raised a similar concern earlier about future scicat file loading. I don't understand why the solution in this PR would hinder adding a SciCatDataset provider later, and I've described above how I would propose to solve that, but maybe there's something I'm missing, can you explain the problem?

SimonHeybrock commented 5 months ago

Please have a look at how I solved a similar problem scipp/esssans#50. Not sure that is perfect, but the idea is that it would allow for adding things such as SciCat, which could convert a filename (or run ID) into a file path.

@jl-wynen raised a similar concern earlier about future scicat file loading. I don't understand why the solution in this PR would hinder adding a SciCatDataset provider later, and I've described above how I would propose to solve that, but maybe there's something I'm missing, can you explain the problem?

The problem is that we have to avoid using a slightly different solution in every project, i.e., we should coordinate.

jokasimr commented 5 months ago

The problem is that we have to avoid using a slightly different solution in every project, i.e., we should coordinate.

Absolutely agree.

I was looking through the PR you mentioned for Zoom and I think the solutions are similar but with some differences. Roughly the types in the Zoom PR corresponds to the types in this PR like this:

# Zoom , # Reflectometry
Filename <-> PoochFilename
FilePath   <-> Filename
DataFolder <-> Does not exist
FilenameType <-> Does not exist

Is there a specific reason to split the file path into folder and name? In my experience that is a source of errors, for example in case you have files with the same name in different folders it is easy to accidentally select the file in the wrong folder.

I also think it seems generally more complicated to add four types instead of two. The way I see it we need N+1 types, 1 for the file path from where we are actually going to read the file, and N more for the N different file sources (Pooch, SciCat, etc).

This is why I'm reluctant to change this to mirror the zoom pr immediately, but if you still think that's the best way to do it then I'll do that, it is very good if file loading is identical everywhere.

jokasimr commented 3 months ago

Closing because this issue was addressed in #40