Closed PeterKraus closed 11 months ago
Closed by #33
Riffing on this, now we have the API harness that can deal with pandas/xarray objects, it would be nice to allow extractors to specify generic packages that are required to be installed to understand the outputs. Currently we just have a couple of defined "common" formats but probably saying you need pandas/numpy/xarray or w/e, would be helpful, could also be extended to cover the idea of returning raw JSON.
Just thinking about this in terms of concrete next steps for schemas to support/encourage, one that might map quite nicely to some of our exisiting filetypes are the Allotrope simple models for certain techniques (which are semantic JSON schemas), e.g., powder XRD: https://gitlab.com/allotrope-public/asm/-/blob/main/json-schemas/adm/x-ray-powder-diffraction/REC/2021/12/x-ray-powder-diffraction.embed.schema.json I'm not sure how widely these are used in academia atm but there's definitely industry buy-in. There's lots of a gaps (e.g., this XRD schema doesn't define peaks...) but might be a starting point (TGA for example has peaks defined: https://gitlab.com/allotrope-public/asm/-/blob/main/json-schemas/adm/thermogravimetric-analysis/REC/2021/12/thermogravimetric-analysis.schema.json)
Following up from the ELN Roundtable, an important point was raised, that there should be a way to force the
Extractor
to only return metadata.In my understanding, this is composed of three steps:
Extractors
which returnmeta-only
, ormeta+data
,Extractor
writers to specify bothmeta-only
andmeta+data
usages in a simple way,meta
part really is, which is most likely out of scope of this WG.The short and simple way to do this would be to extend the
usage
schema to indicate whether a given entry returnsmeta-only
ormeta+data
. However, this might require two usages for eachExtractor
.Tagging @steffenbrinckmann