Closed turbomam closed 1 year ago
For this issue->branch->PR: leave the class that has modeled studies and will now also model data collection consortia named as Study
. That will retain all of the external integration with MongoDB, the API, and the DataPortal.
Add a required slot that takes enumerated values that distinguish between true hypothesis driven studies and data/sample collection motivated research consortia.
The addition of that new required new slot will require a database migration. Kitware/nmdc-server will have to start interrogating the new slot in order to determine whether to draw a study page or a consortium page, but otherwise the impact to the NMDC ecosystem should be minimal.
Additional migration-free possibilities:
initiative_type
optional. Our ecosystem would assume that a Study instance was a true hypothesis-driven study unless initiative_type: sample_or_data_collection_consortium
was assertedinitiative_type
slot that takes enumerated values in its range, use a is_consortium
boolean slot. If that wasn't required, the no migration would be required. But somebody would have to assert is_consortium
on the relevant records already in MongoDB, presumably with a change-sheet.uses slot initiative_type
and enum InitiativeTypeEnum
to drive the differentiation.
Let's reassess those names for the right level of specificity and also double check the preferred capitalization for enum
s and PermisibleValues
. I always forget and have set a bad example of inconsistency.
see
https://linkml.io/linkml/schemas/linter.html#standard-naming
CamelCase for classes
snake_case for slots
CamelCase for enums (already OK)
snake_case (default) or UPPER_SNAKE for permissible_values
We can also check the annotations an mappings for those new elements.
Regarding naming, do we want to paint ourselves into a corner of never being able to use initiative_type
and InitiativeTypeEnum
in any other context?
We could name the enum ResearchInitiativeTypeEnum
and then allow different enums if initiative_type
needs to be used in other classes in the future.
Possible disadvantages? My proposed enum name is longer. Anything else?
We could give some thought to what the real parent class of Study is, whether that class is currently modeled in the nnmdc-schema or not yet.
Is a process? Or a group of people? Something else?
What slots are allowed on EnumDefinition? We don't have to use them all!
I'd like to start micro-crediting, so that people who have contributed will get credit even if they don't make the PR.
I think this will require making an ORCID prefix. https://orcid.org/ ?
@brynnz22 , what's your ORCID? I added two in this branch that look like they may be yours.
caveat
We should be in the habit of updating not just he valid examples, but also the invalid examples. Each invalid example should illustrate one single deviations from the requirements, and the file name should state that deviation.
As new constraints are put on classes, those shouldn't become additional deviations, but rather one new, well-named invalid data file should be created to illustrate the new constraint.
A limitation of this "classifying by enum slot" approach is that it implies that there will only be one axis by which Study-like things can be subClassed. If a Study
and a CollectionConsortium
are different things, then they really deserve their own classes, which can accommodate differences we discover in the future.
We have violated this principle in the past to some degree especially with the Biosample
class. One could say that was an acceptable short-term solution because creating Biosample subClasses for samples from each environment would have required making many more MongoDB collections.
If we chose to make a CollectionConsortium
class as a sibling of Study
, that only involves making one new MongoDB collection. I actually don't like that much either, and that is why I continue to advocate for a collection-less/table-less storage like a RDF triplestore.
We also theoretically have the option of storing both studies and consortia in the one collection, as long as they are given a formal type
slot, whose value would be the CURIe of the most specific class that they instantiate. That has been intended all along by giving the type
class the https://linkml.io/linkml-model/latest/docs/designates_type/ decorator, and slot_uri: rdf:type
, but I haven't gotten a test case to work yet.
In summary, I would like to take the permissible value annotations from #1097, migrate them to class annotations in #1104, and close this issue and it's PR.
If we do decide that an enumeration slot is the best way to differentiate between studies and consortia, I would like to refrain from merging that until we do an audit of other '...type' slots in the schema. It's a mess and I don't want to add to it.
see also
cc @brynnz22