microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

make NMDC data object a ga4gh DRS object? #54

Open dwinston opened 3 years ago

dwinston commented 3 years ago

The ga4gh data repository service (DRS) API spec defines an object type, DrsObject, that has properties useful for workflow automation, for example url+headers for authorized access, or tokens for deferring url generation. I have sketched out a pydantic model for it in the nmdc-runtime repo. Also, #49 suggests a checksum field, which DRS addresses as an array (e.g. [{checksum: ..., type: 'crc32c'}, {checksum: ..., type: 'md5', ...]).

My suggestion here is to make NMDC's data object a DRS object, i.e. align its LinkML definition with DRS's DrsObject spec.

dwinston commented 3 years ago

DRS does not include a compression-format field (e.g. "zip", "bz2") like NMDC does. I suggest we either underscore-prefix such fields (e.g. _compresion_format) so as to clarify that they are not part of the DRS spec, or else document that these fields may clash with future versions (if any) of DRS.

wdduncan commented 3 years ago

I think we make use of some of the properties w/o necessarily making DrsObject per se.

Some of the properties we already have; e.g.:

Other DRS properties need to be evaluated, such as drs:ContentObject. We may already be representing the pertinent information using the data_object_type (see #20).

Possible additions to the schema (IMHO):

turbomam commented 1 year ago

@dwinston and @cmungall should we revisit this issue after GSP, or just close it?