tdwg / material-sample

A Task Group of the Observations and Specimen Records (OSR) Interest Group
2 stars 0 forks source link

Properties for dwc:MaterialEntity #33

Closed cboelling closed 1 year ago

cboelling commented 1 year ago

In conjunction with finalizing the proposal for adding dwc:MaterialEntity as new top-level term to Darwin Core (see #32) the question arises which properties should complement the new class term.

Even without properties (for which there is precedent, e.g., LivingSpecimen, PreservedSpecimen, FossilSpecimen) the new term will be available as a value for dwc:basisOfRecord enabling to share information about material entities which do not fit one of the established more specialized classes.

Initial suggestions for properties of dwc:MaterialEntity include:

These issues are related:

14

20

24

Jegelewicz commented 1 year ago

There are existing properties in occurrence that probably should be properties of this:

catalogNumber preparations disposition otherCatalogNumbers

Jegelewicz commented 1 year ago

I cannot endorse materialEntityType because it creates a rabbit hole of mixing identifications (taxonomy), preparations (parts), and even storage media. We need to create specific properties that don't overlap with other properties.

cboelling commented 1 year ago

I cannot endorse materialEntityType because it creates a rabbit hole of mixing identifications (taxonomy), preparations (parts), and even storage media

I would like to understand these concerns in more detail - would it be possible @Jegelewicz to spell out one or two examples in more detail?

baskaufs commented 1 year ago

It does not seem to me like we are ready for materialEntityType at this point. When we left off at our regular meetings last (northern hemisphere) fall, we were still discussing/testing the three terms @smrgeoinfo had suggested for describing instances of what has now become materialEntity and still had a lot of work to do before proposing anything concrete.

baskaufs commented 1 year ago

There are existing properties in occurrence that probably should be properties of this:

catalogNumber preparations disposition otherCatalogNumbers

Yes, better here than under Occurrence.

tucotuco commented 1 year ago

I am happy with the part of the research that is fully mature for inclusion in the coming public review, i.e., the proposal for dwc:MaterialEntity alone, once that proposal has been fully vetted. It is currently in a series of commits in this pull request. I suspect that is very difficult for people to see the entirety of what is the current proposal, so I recommend that we merge it and us the resulting document version as the target of further comments.

Jegelewicz commented 1 year ago

spell out one or two examples in more detail?

I capture a live rabbit and hold it for observation - I label it as "LivingSpecimen". When it dies, do I need to re-catalog it with a new type "PreservedSpecimen"? Just change the type? How does that affect information generated when it was living?

I collect some bones from a cave and record them as "PreservedSpecimen", because none of the other "types" appear reasonable, even though I have applied NO preservative techniques. Later, some dating results indicate that the bones are 20,000 years old. Do I now call this a "FossilSpecimen"?

Alive or dead is something that a given MaterialEntity could exhibit over time, not a static "type"? Also the words "sample" and "specimen" carry a lot of baggage. That is why we are asking for MaterialEntity. I'd like to see us focus on properties of MaterialEntity that can be additive, not ones that limit. Once I say something is a LivingSpecimen it seems like there is no way out of that box. Maybe I am wrong?

RogerBurkhalter commented 1 year ago

@Jegelewicz , I would say that the MaterialEntity would remain as a "LivingSpecimen" as originally acquired and after it dies the "basisOfRecord becomes a (derived) "PreservedSpecimen" to enable the data gathered while alive persists with the entity. The same would be true for other derivatives (molecular, chemical, osteological preparations, thin sections, etc). That ties things back together with the original entity. Would a living rabbit be de-accessioned from a collection after it dies, or would the remains transition from a cage to a box/jar etc. in a cabinet? I see MaterialEntity as a very "high" level term that provides (some) information on the entity as aquired, maybe I am wrong?

Jegelewicz commented 1 year ago

as aquired

That seems like a hidden perception? It doesn't say that in the definition, but if that is what we mean, we should say it!

This is where LivingSpecimen and Organism appear to be the same thing to me. Can anyone articulate the difference for me?

MaterialEntity would remain as a "LivingSpecimen" as originally acquired and after it dies the "basisOfRecord becomes a (derived) "PreservedSpecimen" to enable the data gathered while alive persists with the entity. The same would be true for other derivatives (molecular, chemical, osteological preparations, thin sections, etc).

Isn't a LivingSpecimen, just an Organism in captivity?

And FWIW - I really wish we stop considering BasisOfRecord. I know that GBIF has given it some special meaning somehow, but it really gums up the works.

tucotuco commented 1 year ago

Are we going off track? I think we're getting into class hierarchies and limitations of Simple Darwin Core for tracking Entities over time - something that might be solved with ResourceRelationships. Before going further, is it reasonable to get a consensus on not attempting any property term proposals? It seems like we were heading in that direction.

tucotuco commented 1 year ago

And FWIW - I really wish we stop considering BasisOfRecord. I know that GBIF has given it some special meaning somehow, but it really gums up the works.

I think GBIF should not be blamed for anything here. The community seeking guidance for data sharing used TDWG as a community consensus-building mechanism to come up with a solution for distinguishing Occurrences (also a simplification the community agreed on) based on different kind of evidence categorized by basisOfRecord. That was agreed to be a reasonable solution in 2009. GBIF did something practical with it that has had an amazing impact since then. Though we recognize an urgent need to grow, anything less than that seems disingenuous to me.

If dwc:MaterialEntity is not discussed in terms of being a possible value of dwc:basisOfRecord, how would it actually be used in the current landscape of data sharing with Darwin Core (the scope we are working within as a Darwin Core Task Group)? If it wouldn't actually be used in any other way, is there any reason to propose the term at this point?

Jegelewicz commented 1 year ago

being a possible value of dwc:basisOfRecord, how would it actually be used in the current landscape of data sharing with Darwin Core (the scope we are working within as a Darwin Core Task Group)? If it wouldn't actually be used in any other way, is there any reason to propose the term at this point?

Maybe we need input from GBIF because right now, if I used MaterialEntity as BasisOfRecord on a single row, my ENTIRE dataset would be eliminated from publication in GBIF as they only accept the current Darwin Core classes.

baskaufs commented 1 year ago

For better or worse, we know what dwc:basisOfRecord is. It is a controlled value string-based system of indicating type. This is spelled out in Section 2.3.1.4 of the DwC RDF Guide. The choice of local name ("basisOfRecord") is a bit confusing (dwc:type might have made more sense), but that's what it is. As such "MaterialEntity" should become a valid value for it once the term is adopted. The fact that we can't us it now is just a reflection of the fact that the term hasn't been adopted yet.

timrobertson100 commented 1 year ago

Maybe we need input from GBIF because right now, if I used MaterialEntity as BasisOfRecord on a single row, my ENTIRE dataset would be eliminated from publication in GBIF as they only accept the current Darwin Core classes.

Both the GBIF Integrated Publishing Tool (IPT) and GBIF.org will be updated to accept the dwc:basisOfRecord value if it is added to Darwin Core.

If/when new properties are added (i.e. this issue) we should consider if it is worthwhile to introduce a new core type of MaterialEntity to the Darwin Core Archive schemas. It will not solve everything, but within its limitations and with a simple change in existing tools will allow us to talk about instances of Material, their properties, IDs, and their relationships rather than instances of an Occurrence while still using existing publishing tools. Today GBIF.org would only be able to treat them as occurrences but there is work underway towards a material catalogue. An alternative could be to only use a richer data model than the star schema inherent in the Darwin Core Archive format which is a larger task.

baskaufs commented 1 year ago

This is where LivingSpecimen and Organism appear to be the same thing to me. Can anyone articulate the difference for me?

I don't want this issue to devolve into an unrelated discussion and the nature of the dwc:Organism class has been the subject of long discussions in the past. To be brief, a LivingSpecimen is an organism (lower case "o") that has been accessioned into a collection. As such, we'd expect it to have properties such as a catalog number, disposition, or any other kinds of metadata that we keep about collection items.

dwc:Organism was minted primarily as a mechanism for tracking the subject of multiple resampling over time or as a node to which multiple taxonomic determinations can be applied. It includes anything that you might want to do those things to, which includes organisms, but also clones, packs, etc. There is no expectation that a dwc:Organism is ever collected or that you would have any idea where it was located other than at the time it was sampled.

So it is possible that someone might want to assert that a particular organism is an instance of dwc:LivingSpecimen and also an instance of dwc:Organism if that's covenient to their purpose. But they aren't synonymous.

dagendresen commented 1 year ago

For better or worse, we know what dwc:basisOfRecord is. It is a controlled value string-based system of indicating type.

I think that we should not think of basisOfRecord as designating the type of the thing (the subject) described by the record (sensu simple Darwin Core record). When adding basisOfRecord = PreservedSpecimen to an Occurrence core data record this cannot be inferred to mean that the Occurrence itself becomes a MaterialEntity thing! Rather, I think we understand basisOfRecord to describe the type of thing that is the evidence for the declaration of an Occurrence. Thus, I think that dwc:basisOfRecord is not the same as a sort of "dwc:type".

I agree with @Jegelewicz to be very careful with the basisOfRecord term -- and, specifically, if we should suggest MaterialEntity as a possible value!

dagendresen commented 1 year ago

Thus, if basisOfRecord remains in DwC as a property used to describe an Occurrence (sensu type of evidence for the Occurrence) -- then basisOfRecord should not be proposed as a property that can be used to meaningfully describe a MaterialEntity.

tucotuco commented 1 year ago

@dagendresen If not recommended as an additional value in the vocabulary of basisOfRecord, what practical value would there be to added the class to Darwin Core? What would be the justification?

dagendresen commented 1 year ago

@tucotuco In my understanding, the new MaterialEntity is introducing a completely new type of thing in Darwin Core and, I fail to understand that the value of adding MaterialEntity is in any way tied to a role as a value for basisOfRecord. (E.g. in a GBIF/DwC-A implementation, MaterialEntity would be a completely new core type, and hopefully not at all to be mixed in with Occurrence in a so-called "simple Darwin Core record"...?).

If we suggest MaterialEntity as a proposed value for basisOfRecord (I am not necessarily arguing not to) then I think we need to be very careful not to assume the meaning that basisOfRecord is the same as a declaration of the type for the subject. The subject (rdfs:domain) of basisOfRecord could anyway only be an Occurrence...? And thus that basisOfRecord could not be used as a property of the new MaterialEntity -- but should remain exclusively as a property of an Occurrence.

tucotuco commented 1 year ago

@tucotuco In my understanding, the new MaterialEntity is introducing a completely new type of thing in Darwin Core and, I fail to understand that the value of adding MaterialEntity is in any way tied to a role as a value for basisOfRecord. (E.g. in a GBIF/DwC-A implementation, MaterialEntity would be a completely new core type, and hopefully not at all to be mixed in with Occurrence in a so-called "simple Darwin Core record"...?).

If not usable as a value for basisOfRecord, what would anyone actually be able do with this new class? If one use of it would be to create a new core type for Darwin Core Archives, the demand for that and a proposal for what that would look like ought to be demonstrated. I think that means defining which terms ought to be re-organized under MaterialEntity (a non-normative declaration) and propose the minimal other terms that would make it viable in a new core (materialEntityID). My only real concern (because I really like the term and have been using it in the GBIF Unified Modeling for a year and a half) is that it have a clearly defined use. It should solve something in practice. In fact, it ought to be demonstrated to solve that something in practice.

If we suggest MaterialEntity as a proposed value for basisOfRecord (I am not necessarily arguing not to) then I think we need to be very careful not to assume the meaning that basisOfRecord is the same as a declaration of the type for the subject. The subject (rdfs:domain) of basisOfRecord could anyway only be an Occurrence...? And thus that basisOfRecord could not be used as a property of the new MaterialEntity -- but should remain exclusively as a property of an Occurrence.

The term basisOfRecord has no rdfs:domain. It doesn't even have a TDWG recommendation about which class it should be organized in - it is a "Record-level" term that might apply to anything. It can't "remain exclusively as a property of an Occurrence", because it isn't a property of Occurrence. In an implementation, such as a MaterialEntity Core schema, the basisOfRecord term could easily be excluded, just as it is for the Event Core and Taxon Core.

dagendresen commented 1 year ago

I think that means defining which terms ought to be re-organized under MaterialEntity

This thread is named "Properties for dwc:MaterialEntity"

It should solve something in practice

The most important thing a new term dwc:MaterialEntity would solve in practice is to enable Darwin Core to describe specimens, something Darwin Core was (arguably) not able to do before such a term is added.

tucotuco commented 1 year ago

It should solve something in practice

The most important thing a new term dwc:MaterialEntity would solve in practice is to enable Darwin Core to describe specimens, something Darwin Core was (arguably) not able to do before such a term is added.

Yes. I am interested in defining how it enables Darwin Core to describe specimens. But we seem to have only one part of the solution close to ready - the proposal for dwc:MaterialEntity. Without a property, at minimum a materialEntityID, I don't think we "enable Darwin Core to describe specimens".

dagendresen commented 1 year ago

What would happen with the current dwc:materialSampleID? When creating a new term dwc:materialEntityID there will be no need for a dwc:materialSampleID term (and materialSampleID could simply be replaced by a materialEntityID)?

Would a new dwc:MaterialEntity sort of replace the current dwc:MaterialSample (and the current dwc:MaterialSample be moved to some sort of materialEntityType vocabulary together with PreservedSpecimen, FossilSpecimen, and LivingSpecimen)? (I do hope we do not believe that basisOfRecord would be used to declare the type for MaterialEntity instances?).

baskaufs commented 1 year ago

I think that part of the problem here is that "occurrence" has always been some sort of kludge that most people don't understand but that everybody has had to use. The problem with occurrence is that it has muddled properties about the assertion of "the presence of an organism at a time and place" with the evidence that supports that assertion. That's not too bad if there is only one piece of evidence (e.g. a record of a museum specimen) -- you can have a single row in a table and include stuff about the when and where along with metadata about the evidence (catalog number, disposition, etc.) When one says the "basisOfRecord" of an occurrence is "PreservedSpecimen" it could be thought of that the basis (evidence) of the occurrence record was a preserved specimen.

This all falls apart when we want to talk about a record of occurrence that is supported by multiple forms of evidence (a photo, a lab notebook, a physical specimen, a DNA sample). There isn't just one "basis" (form of evidence) for the record: there are many. So I still believe that it is true to say that "dwc:basisOfRecord" is a controlled value system for declaring type. The problem comes when we are fuzzy about what is the subject of a record. If we have a row that's about both the occurrence of an organism and the evidence that supports that assertion, what's the subject? I think the solution to this problem isn't with dwc:MaterialEntity, it's with the fuzziness about Occurrence. That's why I'm happy to see the GBIF grand unified model just put a stake through the heart of Occurrence. Once that happens, I think the lack of clarity about basisOfRecord will get better. We would be able to focus on the actual objects that could serve as evidence: MaterialEntities and digital entities. (Or not serve as evidence for anything -- just be a record of the object itself.) These broad categories would be the basic types of objects and we could get more granular by describing their properties with some terms we haven't yet invented. Whether we use "dwc:basisOfRecord" or some other term for expressing the type of these objects (rdf:type, dc:type, dcterms:type, etc.) doesn't matter to me, that's an implementation detail.

I hope we can keep our eyes on the prize here and ratify this term as a first step towards building a way to describe all kinds of physical things. @Jegelewicz suggested 4 properties that could logically be organized in this new class. Let's start with that and build more later.

cboelling commented 1 year ago

I am happy with the part of the research that is fully mature for inclusion in the coming public review, i.e., the proposal for dwc:MaterialEntity alone, once that proposal has been fully vetted. It is currently in a series of commits in this pull request. I suspect that is very difficult for people to see the entirety of what is the current proposal, so I recommend that we merge it and us the resulting document version as the target of further comments.

The current draft for the term request for dwc:MaterialEntity including all incremental contributions ("commits") is always visible for inspection at this URL: https://github.com/tdwg/material-sample/blob/ntr-material-entity/primary_deliverable/materialentity.md

Thank you for the contributions so far - these go nicely together in my view. Please add any further comments regarding the proposal for the New Term Request for dwc:MaterialEntity until tomorrow 1pm UTC to either the dedicated discussion thread for the draft ("the pull request" in git parlance) or the original discussion thread in the task group's repo. Unless something controversial comes up, I will then pull the draft into the canonical version of this Task Group's repo. I also understand that the consensus view is to then use the draft and formulate an NTR in Dwc's main repo, which I offer to do right after merging. If you prefer a different course of action, please come forward and let us know also by tomorrow 1pm UTC.

Jegelewicz commented 1 year ago

Everything @baskaufs says here!

There are already classes of things in Darwin Core that have no properties - what is their purpose (except to be controlled vocabulary for dwc:BasisOfRecord)?

dwc:FossilSpecimen, dwc:LicingSpecimen, dwc:PreservedSpecimen, dwc:HUmanObservation, dwc:MachineObservation, dwc:MaterialCitation

Why was dwc:MaterialSample given a property but the others not? Does that make it more or less useful as controlled vocabulary for dwc:BasisOfRecord? Why does a record of dwc:Occurence not simply include a dwc:BasisOfRecord = Occurrence? In my mind, any record with dwc:BasisOfRecord = "PreservedSpecimen is NOT an Occurence...

By imposing "class" or "property" status on terms, we have immediately given structure to Darwin Core and it is no longer a bag of terms in my view. I find it distressing that this seems to bother no one.

baskaufs commented 1 year ago

By imposing "class" or "property" status on terms, we have immediately given structure to Darwin Core and it is no longer a bag of terms in my view.

I don't entirely understand this. "class" and "property" (and skos:Concept not yet mentioned in this thread) are descriptions of what type of thing the term is. They don't really impose structure in the sense that declaring subclass relationships (creating a hierarchy), or declaring ranges and domains (saying what kinds of things a property is talking about or referring to) do. Those would be what I would consider giving structure to Darwin Core (ontology building in technical terms).

Jegelewicz commented 1 year ago

If I can only apply certain properties to certain classes, there is a structure. Perhaps not an ontology, but something else that my lack of data structure can see but not name.

baskaufs commented 1 year ago

I think I may just have been misunderstanding you. Saying that a term is a class doesn't really impose structure. Saying that a property term should or must be used with instances of certain classes as their subjects (what I think you just said) does impose structure.

In Darwin Core, we don't currently try to strictly restrict properties to having a certain subject class. We do have an "organized in class" metadata property, which is a kind of gentle suggestion about appropriate subject classes. That designation determines what class the property is listed under in the Quick Reference Guide.

Jegelewicz commented 1 year ago

a kind of gentle suggestion about appropriate subject classes

Unfortunately, I think it may be seen as "thou shalt" by some? Perhaps that is only me and there is nothing to consider, but I feel as if I would be some kind of heretic (or perhaps seen as not knowing what I am doing) if I used dwc:disposition to describe any other thing but dwc:Occurrence even though it feels totally inappropriate there.

tucotuco commented 1 year ago

There are already classes of things in Darwin Core that have no properties - what is their purpose (except to be controlled vocabulary for dwc:BasisOfRecord)?

dwc:FossilSpecimen, dwc:LicingSpecimen, dwc:PreservedSpecimen, dwc:HUmanObservation, dwc:MachineObservation, dwc:MaterialCitation

They currently have no other practical purpose than being controlled value terms for basisOfRecord.

Why was dwc:MaterialSample given a property but the others not?

MaterialSample was proposed with the corresponding identifier materialSampleID so that instances of them could be distinguished from each other, allowing for such things as a MaterialSample Core for Darwin Core Archives and for MaterialSample extensions for Occurrence Core-based Darwin Core Archives. Instances of MaterialSamples were considered significantly important enough to distinguish from Occurrences and describe. The original impetus for MaterialSample and the need for it to have identifiers distinct from those of Occurrences was to track tissues that resulted in DNA extractions. A single Occurrence (an Organism at a place and time, one occurrenceID) could result in many MaterialSamples (many materialSampleIDs), and distinguishing many MaterialSamples from each other was of enough interest to the community to justify the proposals and follow them through to ratification.

Does that make it more or less useful as controlled vocabulary for dwc:BasisOfRecord?

No. It has no effect on its utility as a term among those recommended for basisOfRecord.

Why does a record of dwc:Occurence not simply include a dwc:BasisOfRecord = Occurrence?

It could, and such records exist where it isn't possible or desired to make a more refined distinction. For basisOfRecord, "Recommended best practice is to use the standard label of one of the Darwin Core classes." But if you do that, you have no way with Simple Darwin Core to separate records about Occurrences based on PreservedSpecimens from Occurrences based on HumanObservations, for example, and that distinction is immensely useful for people looking for vouchers as opposed to records without material evidence. It doesn't mean an Occurrence IS A PreservedSpecimen. It means that if you are forced to commit to the best single value of what the Occurrence is based on, the one you provide is that best value.

In my mind, any record with dwc:BasisOfRecord = "PreservedSpecimen is NOT an Occurence...

I see it as, a PreservedSpecimen is not an Occurrence. In other words, basisOfRecord does not provide type information. However, a record that has basisOfRecord = PreservedSpecimen should provide me with information about an Occurrence that has a PreservedSpecimen as evidence.

By imposing "class" or "property" status on terms, we have immediately given structure to Darwin Core and it is no longer a bag of terms in my view. I find it distressing that this seems to bother no one.

I don't think that is strictly correct. I think that defining classes and properties ALLOWS terms in Darwin Core to be used to define structures that make sense for a particular scenario. Darwin Core IS a bag of class and property terms. They provide vetted definitions meant to help people organize information in structures that serve their purposes. It does not define or restrict those structures. It provides guidance on what some of those structures might look like by suggesting which class a property would likely describe (using the organized_in attribute, which is not normative).

We make class terms when they represent a concept that can have properties that distinguish them from other concepts. A class could be a table in a database, but it could not be a field in a database. A property could be a field in a database, but it could not be a table.

I think what might be causing the most confusion is that a class can also be the VALUE of a property. The values of the basisOfRecord property are recommended to be from among the Darwin Core classes. The other properties in Darwin Core that have controlled vocabularies developed for them (establishmentMeans, degreeOfEstablishment, and pathway) use skos:Concepts as controlled values.

Perhaps one way to alleviate the confusion around basisOfRecord using some classes that have properties and others that don't would be to define a skos:ConceptScheme for basisOfRecord with new skos:Concepts for the basisOfRecord vocabulary terms in place of the classes that are now recommended. It would then probably be feasible to deprecate the Darwin Core classes that don't have properties (dwc:FossilSpecimen, dwc:LivingSpecimen, dwc:PreservedSpecimen, dwc:HumanObservation, dwc:MachineObservation, dwc:MaterialCitation). I suspect that in practice this change would have no deleterious effect on stability. Practically speaking, it would mean that the classes just mentioned would disappear from the index on the right side of the Darwin Core Quick Referenence Guide and instead would appear in a controlled vocabulary accessible from the "Terms" drop-down menu on that page, under a label for "BasisOfRecord". Note that this solution would not solve any other problem with basisOfRecord than the perceived problem with some classes not having any suggested properties.

smrgeoinfo commented 1 year ago

Going back to the earlier material sample discussions, I proposed that 'sampleType' could be factored into 'kind of object', 'composition' (what is the the sample composed of), and 'sampled feature' (the thing the sample represents that makes it a sample). As I understand the way material entity is being defined and used here, the 'kind of object' and 'composition' properties apply to the proposed dwc:materialEntity (whether or not they are explicitly included in dwc).

I hope that the concept of 'materialSample' as a subclass of materialEntity that has a 'sampledFeature' link doesn't get lost in the discussion.

baskaufs commented 1 year ago

Perhaps one way to alleviate the confusion around basisOfRecord using some classes that have properties and others that don't would be to define a skos:ConceptScheme for basisOfRecord with new skos:Concepts for the basisOfRecord vocabulary terms in place of the classes that are now recommended. It would then probably be feasible to deprecate the Darwin Core classes that don't have properties (dwc:FossilSpecimen, dwc:LivingSpecimen, dwc:PreservedSpecimen, dwc:HumanObservation, dwc:MachineObservation, dwc:MaterialCitation). I suspect that in practice this change would have no deleterious effect on stability. Practically speaking, it would mean that the classes just mentioned would disappear from the index on the right side of the Darwin Core Quick Referenence Guide and instead would appear in a controlled vocabulary accessible from the "Terms" drop-down menu on that page, under a label for "BasisOfRecord". Note that this solution would not solve any other problem with basisOfRecord than the perceived problem with some classes not having any suggested properties.

Although this could be done, I'm not sure that changing them to SKOS concepts is necessary (or desirable). Controlled vocabularies of type (dcterms:type for example) have traditionally had values that are instances of rdfs:Class. We followed that practice with the controlled vocabulary for ac:subtype whose members are classes and not SKOS concepts.

If it's a hindrance, I suppose they could just be declared to be part of a separate vocabulary displayed on a different list of terms page. That would just be an organizational change -- for stability purposes, the IRIs would not have to change. Since what is generally used (in basisOfRecord) are controlled value strings that are composed of the term localnames, there might be some advantage to this, since that list of terms page could include a "Controlled value" field similar to what was done on the ac:subtype list of terms page.

tucotuco commented 1 year ago

@baskaufs I like this solution. I am curious if it would alleviate the concerns expressed by @Jegelewicz and if it seems like a reasonable solution to others. Furthermore, does it seem like a solution worth proposing for public review in the coming round?

baskaufs commented 1 year ago

Since it is really an organizational thing and would have no effect on normative or non-normative term metadata, I would suppose that the DwC Maintenance Group could just do it in the same way that we changed the organization of the Quick Reference Guide. The main reason for putting it to public comment would be to get feedback about whether people would find it confusing and to inform the community that it was about to happen.

Under the hood, there would be a change in the RDF. In an example like http://rs.tdwg.org/dwc/terms/PreservedSpecimen.ttl, the statement declaring membership in a term list:

dwc:PreservedSpecimen dcterms:isPartOf <http://rs.tdwg.org/dwc/terms/>.

would change to something like

dwc:PreservedSpecimen dcterms:isPartOf <http://rs.tdwg.org/dwc/classes/>.

where <http://rs.tdwg.org/dwc/classes/> is a new term list we would create to group those classes. That term list would then be the only one included in a new Darwin Core controlled vocubulary (similar to the other controlled vocabularies we already have). See this diagram for the model of how TDWG terms are organized.

As I said, I think based on the rules of the VMS, the DwC MG could just do it. It doesn't really change or break anything, just clarifies the organization of terms.

tucotuco commented 1 year ago

@baskaufs Thanks for the specifics on how. Yes, we could "just do it" in terms of the rules, but I think it will be useful and respectful to put in with the rest of public review. We really want to know if it will help people rather than hinder them.

tucotuco commented 1 year ago

If we can get some initial feedback here, then we can make it a new DwC issue and the Material Sample Task Group can get "credit" for it. :-)

Jegelewicz commented 1 year ago

Apologies, but I fail to understand what is proposed - it's way over my head?

baskaufs commented 1 year ago

Say that all the classes that don't have properties and that are used as values for basisOfRecord are part of a controlled vocabulary, then put them on a different page from the rest of the Darwin Core terms. People could look at that page for the basisOfRecord options and otherwise not be confused by having them with the rest of the DwC terms.

dagendresen commented 1 year ago

To me, how the controlled values for basisOfRecord are now displayed at the bottom of the terms page is fine (from a layout and presentation point of view). An update to declare the controlled-values-terms to be part of (dcterms:isPartOf) a separate vocabulary (while keeping the term URIs unchanged) sounds fine to me. Probably an improvement.

I struggle much more with what the term basisOfRecord really means, and what using this term means to the thing that the term is added to, etc.

Jegelewicz commented 1 year ago

I struggle much more with what the term basisOfRecord really means, and what using this term means to the thing that the term is added to, etc.

Agree

Jegelewicz commented 1 year ago

declare the controlled-values-terms to be part of (dcterms:isPartOf) a separate vocabulary (while keeping the term URIs unchanged)

Practically, I don't see how this changes anything and I think @dagendresen is asking the question that perhaps I have not been able to articulate.

tucotuco commented 1 year ago

@Jegelewicz Is this what you mean by the question @dagendresen is asking?

I struggle much more with what the term basisOfRecord really means, and what using this term means to the thing that the term is added to, etc.

Here is what I think basisOfRecord is supposed to mean:

"When constructing a row (record) with Darwin Core properties, what is the value from the recommended controlled vocabulary for basisOfRecord that best describes what the record is about."

For a record that just has information about a Taxon, the recommended vocabulary term that best matches is dwc:Taxon. When using the Taxon Core for a Darwin Core archive, the basisOfRecord term is not present because they would all be dwc:Taxon anyway. Similarly for an Event Core, they would all have basisOfRecord as dwc:Event. The Occurrence Core is different because there are several distinct controlled vocabulary terms that might apply at the same time based on what evidence there is for the Occurrence, but we are forced to pick one - the best match. For lots of records the choice is easy. For others we become sad because no matter what we say, it will leave something out in terms of what the record is about. For example, one Occurrence record can contain information about a PreservedSpecimen AND a MaterialSample AND a MaterialCitation and a HumanObservation AND a MachineObservation. This is one of the circumstances in which basisOfRecord breaks down - specifically in the context of a Darwin Core archive.

Jegelewicz commented 1 year ago

The Occurrence Core is different

Why? Creating this "special case" seems to be the root of the problem.

tucotuco commented 1 year ago

The Occurrence Core is different

Why? Creating this "special case" seems to be the root of the problem.

It was a solution to the problem of separating vouchered data from non-vouchered data when Darwin Core started to be used for more than just specimens. We went the extra step of allowing fossils to be distinguished from living and non-living non-fossil specimens, and allowing human observations to be separated from machine observations. These were things people wanted. That's what it provided, albeit imperfectly in non-trivial situations.

A potentially better solution might have been to include terms such as hasPreservedSpecimen, hasFossilSpecimen etc., but that or any other solution was not proposed.

But are we going tangential to the issue of MaterialEntity properties for the upcoming Darwin Core review?

Here is my understanding of the general feeling so far:

The class MaterialEntity will be proposed and we are closing in on a consensus definition for the new term request.

The property materialEntityID might be a reasonable new term request, with the term suggested to be organized_in the MaterialEntity class.

Without the materialEntityID, it doesn't make much sense to talk about adding or rearranging any other properties because there will be no way to instantiate a MaterialEntity and give it properties, except in RDF.

The new materialEntityID property would make the utility of the existing [dwc:materialSampleID](https://dwc.tdwg.org/terms/#dwc:materialSampleID) questionable, suggesting that it might be proposed to be deprecated and replaced by the new materialEntityID property.

There isn't sufficient support for a property materialEntityType at this time.

With the ratification of the MaterialEntity class, if also provided with a materialEntityID property, several existing properties should be organized in this new class instead of where they are currently organized: catalogNumber preparations disposition otherCatalogNumbers

Though it hasn't been discussed yet, there one other property for a new MaterialEntity class that should be made so as to follow the pattern of classes that can be instantiated in Darwin Core (i.e., to be consistent): materialEntityRemarks.

Jegelewicz commented 1 year ago

Closing as out of scope for the current Task Group