Relax restrictions / guidelines

ssomnath commented 5 years ago

Per this google group conversation, we should relax any restrictions placed on how data should be stored beyond the core Main Dataset - Ancillary Dataset constructs.

We need to comb through the entire specifications document and relax restrictions unless absolutely essential. The few examples I can think of at the moment are:

Videos and time series
(mixed precision) compound datasets (cannot be handled in C++, Fortran)
Compound dataset or Channels?
Not mandatory that a single HDF5 file must contain only USID datasets or datasets pertaining to a single raw measurement. Some users have expressed the desire to use a single HDF5 file to store all their imaging, spectroscopy, etc. data pertaining to (for example) a day's measurement or a project.

mpanighel commented 5 years ago

Hello Suhas. I was developing some Python code to convert scanning tunneling microscopy data from proprietary RHK to HDF5 and, after checking NeXus, I came across USID.

I think its core idea of flattening data to 2D is really good (and actually could allow this format to be used indeed as universal standard for scientific data)! This is indeed its strength and I see that for this reason Ancillary datasets are mandatory. On the other hand, for regularly sampled data, as already pointed out, they are also redundant and unnecessarily complicated (besides wasting quite a lot of space especially for gray scale images). This is something that is keeping me a bit.

While still keeping the construct of Main + Ancillary, in order to allow the representation of any data, do you think the definition of the Ancillary datasets could be relaxed? For example the ancillary attribute of a channel could link to a "full length" dataset (as it is now) or to a group/dataset (to be precisely defined) containing start/stop/increment. Then one let pyUSID (or equivalent) to handle this and, if necessary, create "on the fly" the full length dataset.

ssomnath commented 5 years ago

@mpanighel Thank you for your interest in USID and for sharing your ideas. You do indeed bring up good points. We realize that the strict main + ancillary dataset rules make USID an overkill for simplistic and small datasets like images or single spectra. We also realize that the majority of data have parameters that have been varied in a linear manner and have thought to some extent about how to avoid being verbose where unnecessary. We would be happy to work with you to incorporate this capability if you are interested. Please feel free to get in touch with us at pycroscopy@slack.com to discuss more (please send us an email at pycroscopy@gmail.com with the email address you would like to use for your slack account).

pycroscopy / USID

Relax restrictions / guidelines #2