pycroscopy / SciFiReaders

Tools for extracting data and metadata from scientific data files
https://pycroscopy.github.io/SciFiReaders/about.html
MIT License
13 stars 13 forks source link

Is there any data and/or metadata standardization available? #98

Closed nsulmol closed 1 year ago

nsulmol commented 1 year ago

One thing I have noticed while viewing datasets from different file formats in sidpy is that there does not appear to be any explicit standardization of the data or metadata. Particularly:

  1. Data follow the coordinate system definitions of their file formats, and are not standardized to a common origin and/or format.
  2. Metadata are stored as saved for their file format, with no standardization performed.

I wanted to (a) confirm this is the intention, and (b) ask whether, if standardized, there would be any interest in including this in SciFiReaders.

I have been reviewing metadata differences in topographical afm images, and may end up creating a dictionary or 'translator' to allow analyzing metadata in a common format. I cannot guarantee I will finish it (and it would be limited only to topographical data for now), but I am still wondering if there is explicit value for this.

My reasoning for doing so is simply that it would allow a common set of methods/modules for analyzing data from different devices (the laboratory where I am working has multiple different afm/spm devices that save in different proprietary formats). I certainly see the value in using pycroscopy to read from these devices, and ideally could completely abstract away the device used to save the data.

If I were to do such a thing, would it make sense to take advantage of the metadata attribute (rather than original_metadata) for this? I cannot seem to find the current purpose of this attribute right now (though I have not used these tools significantly yet).

ramav87 commented 1 year ago

Good point @nsulmol ! We do not have any metadata standardization. This would be good, but it is somewhat beyond our purview. I know some folks at NIST are working on schema for different types of data like AFM topography, spectroscopy in STM, etc. We would certainly be open to conversions to such standards.

As for the metadata vs original_metadata attribute, metadata is something that we add once we perform some processing step. The original metadata is preserved from the file conversion. So that explains how we envisioned it.

nsulmol commented 1 year ago

Hello ramav87,

Thank you very much for the quick reply. I will keep you posted if I end up creating some form of translator, on the off-chance it makes sense to integrate.