ispyb / ispyb-database-modeling

4 stars 3 forks source link

DataCollection column for identifying the internal data path in the HDF5 file #17

Open KarlLevik opened 6 years ago

KarlLevik commented 6 years ago

We (DLS) would like to have a new DataCollection column for indentifying the path inside the HDF5 file that points to the data related to the data collection. We need this because there can be data for more than one data collection in one single HDF5 file, or at least that is the case for our PDF beamline.

I don't know what a good name would be. hdf5Path? hdf5InternalPath? Something else?

Hereby inviting discussion - @stufisher @graeme-winter @olofsvensson @antolinos ...

graeme-winter commented 6 years ago

Since this could in principle also apply to files != HDF5 files could have "path within file" i.e. to cope with other container files one may use in the future (zip files, for example)

KarlLevik commented 6 years ago

OK, in that case: 'internalFilePath'? Or 'containerFileInternalPath'?

graeme-winter commented 6 years ago

In my world, saying what you mean with fewest words always best, so internalFilePath

jlmuir commented 6 years ago

Or maybe imageSubpath? Or imageContainerSubpath?

I'm still quite new to ISPyB. What column currently holds the path to the data? I think it's imageDirectory, right? And it's obviously a path to a directory containing the image files. So, the full path to a given image can be constructed from imageDirectory, imagePrefix, imageSuffix, and fileTemplate? How is the HDF5 data file stored in this table? Is the path to the parent directory of the HDF5 data file stored in imageDirectory, and the name of the file is stored in imagePrefix, imageSuffix, and fileTemplate, except fileTemplate is just a literal string? If so, then it seems good to continue with the image naming to be consistent with imageDirectory, which is why my above suggestions have image in their names.

I have a slight worry that internalFilePath might be confused to be some kind of file path internal to ISPyB.

KarlLevik commented 6 years ago

What column currently holds the path to the data? I think it's imageDirectory, right? And it's obviously a path to a directory containing the image files. So, the full path to a given image can be constructed from imageDirectory, imagePrefix, imageSuffix, and fileTemplate?

Yes, that's right, plus the dataCollectionNumber column.

How is the HDF5 data file stored in this table? Is the path to the parent directory of the HDF5 data file stored in imageDirectory, and the name of the file is stored in imagePrefix, imageSuffix, and fileTemplate, except fileTemplate is just a literal string?

Yes, except I suspect dataCollectionNumber is also used, and also that fileTemplate, at least at the ESRF, is given as e.g. "testLC1%04d.h5", so not just a literal string.

Anyway, yes, I think imageContainerSubpath sounds nice and descriptive. While it's slightly longer than internalFilePath, it should be clearer and should avoid some misunderstandings.

delageniere commented 6 years ago

Ok for imageContainerSubPath