tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
201 stars 70 forks source link

New Term - dcterms:PhysicalResource #421

Closed baskaufs closed 1 year ago

baskaufs commented 1 year ago

New term

Proposed attributes of the new term:

ghwhitbread commented 1 year ago

Post it here before moving to DwC discussion, please, @stanblum. I’d like to know if it comes closer to our existing model or drifts away.

Jegelewicz commented 1 year ago

I am in favor of the term as given in the original proposal. For me it defines the box we need for "things". The "Human Resources" argument is the best I've heard if someone should oppose the term for the ethics of using "resources" in reference to humans.

I can certainly bring this before @tdwg/material-sample but I also see no reason they couldn't just comment here. Participation in meetings has dropped off and I don't want to have this decision be made by three or five people. I will email the task group members.

deepreef commented 1 year ago

While it sometimes may seem frivolous, edge cases are most helpful in challenging our means for information exchange and sharpening our thinking. It's a good thing to continue to bring them up.

Agreed! Indeed, it's the edge cases I always go to first to finesse where the boundary is. But there is often a point of diminishing returns where further scrutiny of rare edge cases impedes progress more than it enhances clarity and precision. In the age-old battle between the "perfect" and the "good enough" (which are often mutual enemies), I am often rooting for the "perfect" more than most people. But I also understand that this can be counterproductive.

stanblum commented 1 year ago

From the two main points of feedback (I think):

  1. An occurrence is the existence of an Organism at a Location and time
  2. Identification applies to the Organism

Key feature (a reveal to me; duh!) is that an Occurrence is then the intersection (association entity; M:M) between Event and Organism.

[Revised to correct foreign keys in Token subclasses to be OccurrenceID instead of OrgansimID.]

DarwinSW-simplified2 drawio

Which looks almost the same as @jbstatgen 's first diagram, except for the use of arrows, I think.

deepreef commented 1 year ago

Only modification I'd make is that the Token box should also have an arrow pointing to Identification. Also, I'm a little fuzzy in my own mind about the exact relationship between Organism and Token. E.g., must it always pass through (at least) one Occurrence? So far, that's how we do it, and haven't found a need to change it. But does require expanding the scope of "Occurrence" to things like subsampling in a lab, or photo sessions not related to the time and place of the organism in nature.

Ok, and like @Jegelewicz, I am likewise in favor of the term as given in the original proposal (to stay on topic...)

dagendresen commented 1 year ago

My preference would be to name the new class term simply Material.

And make a new property term for materialType.

And move to a controlled materialType vocabulary: FossilSpecimen, LivingSpecimen, PreservedSpecimen, and MaterialSample (the latter for tissue and environment samples, etc. as today).

Would we in such a case want to rename materialSampleID to materialID?

I do prefer an explicit strong link to bfo:MaterialEntity, and I do not mind a link to dcterms:PhysicalResource (to be captured in the term comments).

I agree with @deepreef that the "essence" of Organism we want to capture here is different from the material component of it.

dagendresen commented 1 year ago

(I also think we do need a new class Evidence, Token, or simply Record (for recorded evidence)! And that it would be within the mandate of our task group to propose this).

cboelling commented 1 year ago

I realized that I have misused the dcterms: namespace abbreviation when writing my earlier comments in this issue. I think this has unnecessarily complicated the discussion for which I'm sorry.

I have amended my earlier posts accordingly, especially here. I think this now partly resolves the concerns mentioned by @baskaufs (Apologies for stealing your time).

Nonetheless, I stand by my proposal to formally link to bfo:MaterialEntity for the same reasons stated earlier. Applying the same argument as in the proposal made in the top comment, entailments shouldn't be an issue because they are not formally declared in the bag of terms approach.

baskaufs commented 1 year ago

@cboelling Since you are essentially suggesting a counter-proposal, I think it would be helpful if you would create a new issue using the new term template and fill out exactly what you are suggesting as the term's metadata properties. In particular, how would you propose to make the link to the external terms? In the efficacy justifications you can reference this proposal and succinctly summarize your arguments as to why your proposal is an improvement over importing dcterms:PhysicalResource

If you are able to do that, I would like to request the members of the TAG to discuss the two proposals. In the past, if there was a well-known external term that captured the essence of what we wanted, we have imported it in preference to creating our own. In cases where our suggested use of the imported term was different or more specific than the use described by the minting organization, we have used non-normative "Notes" (dcterms:description in RDF) or normative "Usage" (skos:scopeNote in RDF) along with non-normative "Examples" (skos:example in RDF) to clarify. Your proposal is somewhat of a departure from this practice and I would like to hear what the TAG thinks about the approach since it would be setting a new precedent.

@tucotuco Can we add a field in the new term form for English label? It's not there presently and probably should be.

tucotuco commented 1 year ago

@baskaufs The two issue templates have been updated to include "* Term label (English, not normative): ".

baskaufs commented 1 year ago

Thanks @tucotuco

tucotuco commented 1 year ago

Sorry to be late in much of what will follow here - not enough time to keep up consistently. There has been a lot of good discussion. Though there is a lot I would say about lots of comments in this issue, I feel compelled to answer questions specifically addressed to me.

@deepreef from https://github.com/tdwg/dwc/issues/421#issuecomment-1333467017

(@tucotuco said) The class dcmitype:PhysicalObject would clearly not work as a broader term (superclass) for dwc:Organism, as an Organism need not be inanimate.

I'm not sure if @tucotuco meant to suggest that dwc:Organism would be treated as a subclass of this new class (whatever the label ends up being), but I would strongly advise agaisnt representing it that way.

I would not suggest any subclassing in Darwin Core itself. This is something we once had with the values of basisOfRecord as a formal type vocabulary (similar to what Dublin Core has for the dcmitype: terms as a type vocabulary for dcterms:type) and we abandoned it in favor of deferring any kind of over-arching modeling for a later exercise when we were mature enough technically. We are still not engaged in that exercise in Darwin Core.

@jbstatgen from in https://github.com/tdwg/dwc/issues/421#issuecomment-1331364896:

@tucotuco Following your links and looking at your figure 2 (see above) of the README.md with fresh eyes and the discussion of the past days in the back of my mind, I have several questions.

The starting point for my inquiries is that I would like to understand what you need "Organism" for in the GBIF model to have it as subcategory of bfo:Entity? What is the role of the class in the GBIF model? What is it used for?

First, a point of information. Entity in the Unified Model is currently our own fabrication. It is inspired by bfo:entity, prov:Entity, sosa:FeatureOfInterest and dsw:Token, but we have no formal ontological declarations of any kind at this point. By experience, doing that in an early stage of modeling constrains one's world view. We are only now beginning to consider the benefits of aligning formally with specific world views.

In your figure 2 you placed "Organism" in the same column as bfo:Entity and you also describe it as subcategory to bfo:Entity in your last post. However, the definition of dwc:Organism starts out with "Instances of the dwc:Organism class are intended to facilitate linking one or more dwc:Identification instances to one or more dwc:Occurrence instances. ..." This sounds more like the class being an abstract construct, maybe even "only" a tool and not something "real", tangible. It seems thus quite different from bfo:Entity

In the diagram for version 4.5 of the Unified Model, there is no implied significance of the "columns" of tables. Their arrangement is primarily a practical one to minimize clutter. The meaning is instead captured in the cardinality indicators in the connections between tables. Reading that from the diagram says that Organism (here we do mean sensu Darwin Core) is a subtype of MaterialEntity, which is a subtype of Entity.

That quote above about dwc:Organism is from the (non-normative) Comments for the term, not from the (normative) Definition. The Comments apply specifically in the Darwin Core context, which is flat with respect to describing Occurrences (the Event, the Location, the evidence, the Organism, the Identification, the Taxon are all part of one wide undifferentiated row). For anything fancier, an extension attached within the confines of the star schema constraint is required.

Just because an Organism is a MaterialEntity doesn't mean it can't be more than that. In fact, it must be, or we wouldn't bother using a separate class for it. We believe Organisms can also be Agents, for example. In the Unified Model, Organisms are not just part of a flat Occurrence with inferred links to other concepts. Some or all of the material remains of an Organism can also provide the evidence for an Identification. An Organism is also the entity/Entity/featureOfInterest of an Occurrence and its material remains can provide the evidence for that (digital evidence via DigitalEntities can also).

At the same time, bfo:Entity is the top category for bfo:Continuant and bfo:Occurent. Yet, you place "Occurence" to the side in a new column by itself, suggesting that it is something quite different.

Again, the arrangement in columns is not meant to have special significance and formal alignment with BFO is not established. The significance is in cardinality of the relationships provided. In the Unified Model, an Occurrence is a subtype of Event. It's a special kind of Event in which there was evidence of an Organism having been within a Location during some period of time.

Finally, I have become a fan of the PROV model, since I like to see the DES and the future of data modeling as fundamentally transactional with an event-based data model at their heart. The colored boxes denote the three foundational elements of the PROV standard: entities, agents and activities. In PROV they are all directly connected in a kind of triangle. In the GBIF model you are placing "Occurrences" inbetween the "Entities" and the "Activities". I don't understand why you need them there. I would appreciate it if you could explain and maybe provide an example.

In the Unified Model, as a subtype of Event, an Occurrence is a prov:Activity where the prov:Entity is the Organism and there are various possibilities of prov:Agents associated with that connection, such as observer, collector, photographer, etc.

For me, reality at some place and time results in an occurrence. In our perception we might focus on the rock facies component of reality (gneiss and basalt as igneous rocks, not sandstone) and not the organism(s) growing on its surface (eg. algae and lichens). The rocks and organisms can be classified according to some relationship/similarity/ancestry scheme. Up to now, the rock facies and organisms are abstract, general (universal? there is an expression for this) concepts. Once we move in an activity/event from the concepts to the specific instances, we have an empirical fact to collect, preserve and share, an entity.

Thus, I would at this point argue that dwc:Organisms are of a different "quality" than bfo:Entity, though will be happy to better understand your perspective on this.

I hope my responses help. Let me know if they leave any doubts.

@stanblum from https://github.com/tdwg/dwc/issues/421#issuecomment-1333467017:

I have to apologize, too. I modified my previous diagram a bit to accommodate some of the comments, but I hesitate to post it here and continue this thread about Occurrence, Event, Token, etc., because it's essentially peripheral or even irrelevant to the primary issue here -- the proposal to import(?) dcterms:PhysicalResource. @tucotuco, should we move these comments Occurrence to a discussion in DwC (not an issue)?

It might have been a great idea to start this conversation in a Github Discussion in this repository, but given that all of this commentary has been transmitted via email to anyone who is watching the Darwin Core issues (with links to specific comments and such), it would be problematic to move them. I think that if we can periodically provide a summary-so-far comment of the stuff directly relevant to the proposal we should be fine staying here in this issue with the diverse connected conversations. If we come to any concrete conclusions about how to save the (modeling) world on related topics, we should probably do our best to make that happen in other appropriate places as well. For the Unified Model stuff, that would be GBIF's Discourse forum.

cboelling commented 1 year ago

@baskaufs:

@cboelling Since you are essentially suggesting a counter-proposal, I think it would be helpful if you would create a new issue using the new term template and fill out exactly what you are suggesting as the term's metadata properties. In particular, how would you propose to make the link to the external terms? In the efficacy justifications you can reference this proposal and succinctly summarize your arguments as to why your proposal is an improvement over importing dcterms:PhysicalResource

In short, I see 3 alternatives:

  1. import dcterms:Physicalresource (this proposal)
  2. import bfo:MaterialEntity
  3. create dwc:MaterialEntity as a separate resource under control of DwC but terminologically (through the term label) and conceptually (through the (possiby adapted) definition) informally linked to bfo:MaterialEntity

I can do as you suggest but I would like to run this by the Material Sample Task Group. In my opinion, incorporating a top level term for material entities with accompanying metadata and documentation is the key result of this chartered task group, whichever alternative the TG gets behind.

In cases where our suggested use of the imported term was different or more specific than the use described by the minting organization, we have used non-normative "Notes" (dcterms:description in RDF) or normative "Usage" (skos:scopeNote in RDF) along with non-normative "Examples" (skos:example in RDF) to clarify.

These don't seem to be part of the new term form or is there a mapping?

baskaufs commented 1 year ago

@cboelling The new term template is a little unclear about this, sorry. Usage comments (recommendations regarding content, etc., not normative) in the template is what ends up in the Notes field in the Darwin Core List of Terms (and List of Terms documents for other vocabularies) and in the Comments field in the Darwin Core Quick Reference Guide. It is the dcterms:description value in the RDF. Darwin Core does not (yet) have a usage field (skos:scopeNote) for any terms because few of its terms are imported from other vocabularies. It has been used commonly in Audubon Core, which borrows many terms whose definitions are set outside of TDWG and therefore sometimes needs to provide normative guidance on how these terms should be used in the TDWG context. So "usage comments" in the Darwin Core template does not correspond to the "Usage" field as it appears in the Audubon Core list of terms.

These patterns are historical artifacts and we probably should get our act together to make the terminology more consistent across documents. Hope this helps clarify.

Jegelewicz commented 1 year ago

This was discussed at length today in the @tdwg/material-sample meeting. At this time, we plan to review the very detailed proposal made by @cboelling and make a decision on which of the three choices he proposed we prefer as a committee. That meeting is scheduled for January 18. We request that this proposal be held until at least until then. For more information and to join the discussion see https://github.com/tdwg/material-sample/issues/31

baskaufs commented 1 year ago

Following the discussion at the January 18 metting, I would like to request that this proposal be withdrawn (closed) in favor of the proposal for dwc:MaterialEntity soon to be submitted by the MaterialSample task group.