marda-alliance / metadata_extractors_schema

Archive of MaRDA Metadata Extractors Schema. See datatractor/schema for the current repository.
https://github.com/datatractor/schema
MIT License
6 stars 1 forks source link

`Extractor` specifying type of output (meta)-data #26

Closed PeterKraus closed 11 months ago

PeterKraus commented 1 year ago

Following up from the ELN Roundtable, an important point was raised, that there should be a way to force the Extractor to only return metadata.

In my understanding, this is composed of three steps:

The short and simple way to do this would be to extend the usage schema to indicate whether a given entry returns meta-only or meta+data. However, this might require two usages for each Extractor.

Tagging @steffenbrinckmann

ml-evs commented 11 months ago

Closed by #33

ml-evs commented 8 months ago

Riffing on this, now we have the API harness that can deal with pandas/xarray objects, it would be nice to allow extractors to specify generic packages that are required to be installed to understand the outputs. Currently we just have a couple of defined "common" formats but probably saying you need pandas/numpy/xarray or w/e, would be helpful, could also be extended to cover the idea of returning raw JSON.

ml-evs commented 6 months ago

Just thinking about this in terms of concrete next steps for schemas to support/encourage, one that might map quite nicely to some of our exisiting filetypes are the Allotrope simple models for certain techniques (which are semantic JSON schemas), e.g., powder XRD: https://gitlab.com/allotrope-public/asm/-/blob/main/json-schemas/adm/x-ray-powder-diffraction/REC/2021/12/x-ray-powder-diffraction.embed.schema.json I'm not sure how widely these are used in academia atm but there's definitely industry buy-in. There's lots of a gaps (e.g., this XRD schema doesn't define peaks...) but might be a starting point (TGA for example has peaks defined: https://gitlab.com/allotrope-public/asm/-/blob/main/json-schemas/adm/thermogravimetric-analysis/REC/2021/12/thermogravimetric-analysis.schema.json)