microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
26 stars 8 forks source link

new grouping efforts must be exhaustive #1418

Closed turbomam closed 1 month ago

turbomam commented 7 months ago

New attempts to organize elements in the nmdc-schema with is_a, slot_group, in_subset or even YAML # comments very much appreciated

These grouping must be exhaustive, meaning that there must not be any elements that are eligible to grouped but have been omitted from the grouping. The person adding the new grouping mechanism is responsible for maintaining the group and/or delegating that to another responsible party.

see also

mslarae13 commented 7 months ago

I don't understand this issue??

turbomam commented 6 months ago

Thanks for speaking up @mslarae13. This issue is intended to generalize a common theme from two previous issues, but I guess that would be hard to tell from the two issues's titles.

A data_portal_subset was introduced earlier this year.

It has the description

Subset consisting of entities that Kitware/nmdc-server use to populate the data portal.

And the comment

Schema authors are responsible for alerting and supporting Kitware and nmdc-server authors about changes they will have to make if entities labeled with data_portal_subset are modified.

But there are only four elements ion the subset right now, and they are all related to DOIs. DOes that mean that Kitware doesn't use any other element to build the nmdc-server backend? Does that mean that all other schema elements can be modified without any need to communicate with Kitware? If not, then this subset should be removed, or all element that satisfy its criteria should be added to the subset.


1352 introduces both a # EMSL comment ion the schema and the following LinkML comment for the bulk_elect_conductivity slot:

Compatible with EMSL metadata

Does that mean that elements that don't have the "Compatible with EMSL metadata" aren't compatible with EMSL metadata? What does it mean to be compatible EMSL metadata? Does the # EMSL comment mean that all EMSL-related slots appear below that comment? If not, then the # comment and the LinkML comment should be removed.


Using organizational features like subsets, # comments as headings, and LinkML comments in an inconsistent way undermines their organizational value. People adding new organizational features are responsible for ensuring that they are used consistently for the foreseeable future. The contributors of the two issues above should open PRs to address these issues and should communicate with the contributors of other terms that satisfy the "data_portal_subset" and "Compatible with EMSL metadata" categories, but have been been annotated accordingly yet.

There may be some historical, inconsistent (or unhelpful) use of organization features in the schema. I can try to weed those out.

kheal commented 1 month ago

I'm moving this to a discussion since there are no specific actions to take.