marda-alliance / metadata_extractors_schema

Archive of MaRDA Metadata Extractors Schema. See datatractor/schema for the current repository.
https://github.com/datatractor/schema
MIT License
6 stars 1 forks source link

`Usage:format`: per-usage format spec #54

Closed PeterKraus closed 5 months ago

PeterKraus commented 6 months ago

Sparked by https://github.com/marda-alliance/metadata_extractors_registry/pull/78#discussion_r1519792884

It might be useful to have a mechanism to indicate what package/library needs to be present in the "caller" environment in order to understand the format of the objects returned in-memory.

Currently, we only have an install target of [formats] in the API, which installs xarray and pandas into the parent environment. However, if the required library is not present in the parent environment, the unpickling of the shared memory object will fail. We can annotate what's required (should be a single library per usage, in my opinion) here, and then modify the API to use this data.

See the Extractor-datatree.yaml example file to see what I mean in more detail.

PeterKraus commented 6 months ago

I expected this to need some thought.

I don't want to implement another package manager (hence the npm story I shared with you). The idea for this one was really to provide a way to:

I am also not 100% happy with calling it "format" and the description could be improved to clarify the above. Would you be OK with that?

PeterKraus commented 5 months ago

On today's meeting, we've decided that it's reasonable to expect the in-memory returned object to be either a native python object, or something that is understood by pandas or xarray (i.e. the current content of [formats] which ought to be installed by default, see https://github.com/marda-alliance/metadata_extractors_api/issues/33).

We hope to implement a proper output spec once more packages are in the registry. Closing.