tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
206 stars 70 forks source link

Change term - associatedOccurrences #324

Closed tucotuco closed 3 years ago

tucotuco commented 3 years ago

Change term

Current Term definition: https://dwc.tdwg.org/terms/#dwc:associatedOccurrences

Proposed new attributes of the term:

The current definition of the term is "A list (concatenated and separated) of identifiers of other Occurrence records and their associations to this Occurrence." Yet in 2014 the term was re-organized in the then new Class Organism. So, either the re-organization was incorrect or the definition is no longer correct.

At another level it is unclear that this term still has a use that cannot be filled in other ways. It may be that the introduction of the Organism class made this term superfluous. For example, all of the Occurrences associated with a given organism can be determined by having a shared organismID. All of the Occurrences associated with a given Event can be determined by having a shared eventID. Are there other uses for the term? Do these depend on the organization of the term within the Organism class? Or within the Occurrence class as the current definition still suggests? Should we consider moving it to the record level so that it could apply to any type of record? Should it be deprecated?

One thing that this field could still do is make sure that all of the related Occurrence records are accessible within the record to which they are related so that it is "self-contained" and doesn't require having all of the related Occurrences at hand to detect the relationships.

tucotuco commented 3 years ago

See related discussions at https://discourse.gbif.org/t/differences-in-use-between-associatedoccurrences-and-associatedorganisms/2561/5 and https://github.com/tdwg/dwc-qa/issues/171.

deepreef commented 3 years ago

@tucotuco : I had always assumed that associatedOccurrences was not about organism associations, but actual occurrence associations. For example: an insect and a plant and a a frog are all collected at the same Event. We have three organisms (one insect, one plant, one frog). They're united by a common EventID, but we somehow want to record the fact that the insect was collected while eating the plant. We could establish a relationship between the two organism instances (the insect instance and the plant instance), but that would imply that this individual insect [organism] and plant [organism] are joined across their scope as organisms. A more accurate way to record it would be that the occurrence instance of the insect (at that event) is linked to the occurrence instance of the plant (at the same event). That way, you're explicitly saying something about the relationship between the insect and the plant at the specific place and time of the Event. We can't rely on the EventID to aggregate them in this way, because the frog (also at the same event) didn't participate in the relationship between the insect and the plant.

Having said all that, I would advocate that the accociatedOccurrences term be deprecated, and instead people should start using the (incredibly powerful and incredibly underutilized) ResourceRelationship class to capture these kinds of associations. But I gather that's a topic for another thread/issue...

EstebanMH-SiB commented 3 years ago

I think it should be deprecated, there is nothing we cannot do with the other elements. Perhaps we can cover the example that @deepreef made with associatedOrganisms, so they will refer to relationships between groups of organisms rather than inside a group of organisms. In that case, we can change the examples, so they can have more relationships (host of, predator of, etc) than just between family like is today (sibling of, parent of, etc). I am not sure if that last option is viable, is just my humble opinion.

I also think that the ResourceRelationship is a better option, we try to promote this extension with our publishers. Unfortunately, people almost always prefer to use elements inside the standard rather than an extension, maybe because it is easier.

tucotuco commented 3 years ago

@EstebanMH-SiB Further examples are easy to add. As non-normative content they do not require a public review. They just need vetting by the Darwin Core Vocabulary Maintenance Group (who all read these issues). I want to be clear about something in your suggestion, because, depending on what you mean, it might require a change of definition, and that would be cause for a public review if it made it past the initial requirements of justification (see section 3.1 Justifications for change in the Vocabulary Maintenance Specification). You suggest examples between groups of organisms. The definition of associatedOrganisms is "A list (concatenated and separated) of identifiers of other Organisms and their associations to this Organism." The subject is a single instance of a Darwin Core Organism, which is defined as "A particular organism or defined group of organisms considered to be taxonomically homogeneous." The object is a list of single Darwin Core Organisms. Insofar as the subject or object is a defined group, one could make group associations with this term. What you shouldn't do is make an association such as "host of Mallophora ruficauda". The object should be a specific individual Mallophora ruficauda, not members of the species in general.

jhpoelen commented 3 years ago

@zedomel pointed to this thread. Re: your ideas to deprecate associatedOccurrence - I am aware of many datasets published by the Arctos community that uses the associatedOccurrence field to document the rich linking between specimen (e.g., parasite-host, predator-prey). fyi @Jegelewicz @campmlc and @dustymc .

deepreef commented 3 years ago

@jhpoelen and others : Would the Arctos community be willing/able to capture/express these associations via ResourceRelationship? That would be a much more powerful/explicit way to share this information via DwC. If not, then associatedOccurrence should probably not be deprecated.

debpaul commented 3 years ago

@deepreef Side topic wish. To expose these instances where groups are using ResourceRelationship as examples for the broader community. Note that it can be difficult via aggregators to be aware if there's data in a dataset shared via this or other "extensions." And getting to adoption of this as best / better practice we need these examples. (It's not necessarily easy to understand (intuitive) how to use ResourceRelationship).

Jegelewicz commented 3 years ago

@deepreef probably - it would just be a mapping change in our DwC file (I think) @dustymc can confirm.

deepreef commented 3 years ago

@debpaul - yes! That's definitely a problem. But it feels like we're on the cusp of a next-gen way of sharing data in TDWG/Biodiversity-Space. Based on side-chats at the Virtual TDWG conference last year, I sense that several initiatives are building momentum for more robustly sharing our data, and I think (hope?) that broader adoption of ResourceRelationship (and other underutilized components of the DwC standard) might be one of those things that becomes more prevalent. We're coming off a year of post-Ransomware+COVID chaos, but things are looking very promising with our new Director of Informatics (Melissa Tulig). We're very close to re-birthing access to our content via IPT, so I'm looking forward to sharing data via these various extensions more consistently. Maybe if we get enough "critical mass" from the community along these lines, this will become more intuitive and common among DwC/DwCA content providers.

debpaul commented 3 years ago

@deepreef this is all great news! @mdoering is ResourceRelationship indexed on GBIF?

jhpoelen commented 3 years ago

Many examples existing of GloBI indexing species interactions via Resource Relations extensions exist (e.g., Field Museum, Yale Peabody, Milwaukee Public Museum, Catalogue of the Rust Fungi of Belgium).

Do note, however, that existing collections use various other ways (e.g., associatedTaxa, associatedOccurrence) that may need some time to adopt to some new best way to capture associations, if they have the means/skills to change current practices at all.

I hope we'll figure out some balance between respecting / supporting existing use while making it easier to share associations for new/active collections.

Perhaps introducing an automated way to translate/convert existing DwC-A to DwC-A + resource relations would help with that.

deepreef commented 3 years ago

I wouldn't expect the transition to happen overnight, but the more comfortable we can get content providers to structure information using terms from the ResourceRelationship class instead of the associatedXXXX terms of the various classes, the better. But I agree we don't want to abandon any of the associatedXXXX terms while people are still using them, when they are not in a position to represent the same information via a ResourceRelationship extension.

debpaul commented 3 years ago

@jhpoelen wrote:

Many examples existing of GloBI indexing species interactions via Resource Relations extensions exist (e.g., Field Museum, Yale Peabody, Milwaukee Public Museum, Catalogue of the Rust Fungi of Belgium).

Great, but how do you know this Jorrit? Can a typical user of GBIF, iDigBio, VertNet, etc, discover these data through the front door? Indexed, and Searchable via the UI? Or soon to be?

I'm really glad to have those examples you just provided. These help me and others when folks ask "how to use the extensions" and "who else is using them?"

debpaul commented 3 years ago

@jhpoelen writes:

making it easier to share associations for new/active collections.

Anything that makes it easier to use extensions to share richer data, yay!

jhpoelen commented 3 years ago

Great, but how do you know this Jorrit? Can a typical user of GBIF, iDigBio, VertNet, etc, discover these data through the front door? Indexed, and Searchable via the UI? Or soon to be?

I know this because I've worked with these rich datasets and because they are indexed by GloBI. You can find a list of sources at https://globalbioticinteractions.org/sources . And you can discover the data via https://globalbioticinteractions.org/data .

jhpoelen commented 3 years ago

@debpaul fyi there's an upcoming hands-on workshop organized via Terrestrial Parasite Tracker (https://parasitetracker.org) that you might want to attend to learn more about indexing / review tools that GloBI uses: https://www.idigbio.org/content/practical-exploration-biotic-interaction-data-management-and-information-retrieval-through .

debpaul commented 3 years ago

@jhpoelen how does one get invited to this hands-on workshop organized via Terrestrial Parasite Tracker?

jhpoelen commented 3 years ago

to register for the "A Practical Exploration of Biotic Interaction Data Management and Information Retrieval through TPT and GloBI" workshop:

  1. visit https://www.idigbio.org/content/practical-exploration-biotic-interaction-data-management-and-information-retrieval-through
  2. click on "register here"
  3. provide registration information

One of the ideas is to turn the workshop into a Carpentries style lessons, so that others can follow along.

EstebanMH-SiB commented 3 years ago

@tucotuco Thanks for the clarification. In my suggestion I was thinking about relationships between specific induvials of different species. Something like "Eating: USalle:Plant:101" for the insect that was captured eating the plant, and "Food of: USalle:Insect:023" for the plant that was being eaten. So no need to change a definition, just the examples.

I am on board for pushing the widespread use of ResourceRelationship! And in the cases the publishers can’t or don’t want to use it, we can use associatedTaxa for capture the relationship between species and associatedOrganisms for the relationships between specific individuals.

nielsklazenga commented 3 years ago

I agree with @deepreef regarding the class placement:

I had always assumed that associatedOccurrences was not about organism associations, but actual occurrence associations....

I also think that associatedOccurrences are best delivered using ResourceRelationship (if you are using the IPT or similar and if the intended consumer can do something with it), but that does not mean associatedOccurrences should not be defined in Darwin Core (especially since it is already defined). Darwin Core is primarily a vocabulary and should be neutral on the exact form in which the data is exchanged.

A usage comment that the information is often better delivered using ResourceRelationships (when appropriate) would do much more to encourage people to use ResourceRelationships for associatedOccurrences than removing associatedOccurrences altogether.

ResourceRelationships might work well in the limited context of exchanging associated occurrences within Darwin Core Archives, but when you want to exchange in a more graph-like format, a ResourceRelationship object will quickly become very problematic and you are probably better off with associatedOccurrences.

I am totally not a fan of these broadly defined classes that can be used for everything and think it is always better to use more narrowly defined classes, or dedicated extensions, that actually define the content (data) rather than the container and make less assumptions about the consumer knowing without being told what the data means and how it should be processed. ResourceRelationship is more a piece of syntax, or maybe a utility class, than a class that should be defined in Darwin Core. There are better alternatives for that available outside Darwin Core now.

For certain purposes, a string is enough. Might be even better.

campmlc commented 3 years ago

In the Arctos museum collection management ststem and online public portal, we document explicit relationships between individuals= actual occurrence associations, and we link records these via reciprocal urls with a relationship term such as "parasite of/ host of, collected with, same individual" as etc. When we export these relationships as Darwin Core, the relationships are mapped to associated occurrences. We could consider mapping to ResourceRelationship if that would enhance discoverability and clarify that the relationship specifies actual occurrence associations.

On Tue, Apr 6, 2021, 2:47 AM Niels Klazenga @.***> wrote:

  • [EXTERNAL]*

I agree with @deepreef https://github.com/deepreef regarding the class placement:

I had always assumed that associatedOccurrences was not about organism associations, but actual occurrence associations....

I also think that associatedOccurrences are best delivered using ResourceRelationship (if you are using the IPT or similar and if the intended consumer can do something with it), but that does not mean associatedOccurrences should not be defined in Darwin Core (especially since it is already defined). Darwin Core is primarily a vocabulary and should be neutral on the exact form in which the data is exchanged.

A usage comment that the information is often better delivered using ResourceRelationships (when appropriate) would do much more to encourage people to use ResourceRelationships for associatedOccurrences than removing associatedOccurrences altogether.

ResourceRelationships might work well in the limited context of exchanging associated occurrences within Darwin Core Archives, but when you want to exchange in a more graph-like format, a ResourceRelationship object will quickly become very problematic and you are probably better off with associatedOccurrences.

I am totally not a fan of these broadly defined classes that can be used for everything and think it is always better to use more narrowly defined classes, or dedicated extensions, that actually define the content (data) rather than the container and make less assumptions about the consumer knowing without being told what the data means and how it should be processed. ResourceRelationship is more a piece of syntax, or maybe a utility class, than a class that should be defined in Darwin Core. There are better alternatives for that available outside Darwin Core now.

For certain purposes, a string is enough. Might be even better.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/324#issuecomment-813947072, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBD6LI6HQUXQPDTO6MLTHLDATANCNFSM4ZPA7KKQ .

tucotuco commented 3 years ago

A review of the discussion up to this point (and the actual definition of the term) suggests that the the organization of the term within the Organism Class was an error. This is actually good news from the standard maintenance perspective because the attribute tdwgutility:organizedInClass is not normative, making it immensely simpler to change - at the discretion of the Maintenance Group. However, discussions around changes to relationshipOfResource (https://github.com/tdwg/dwc/issues/194) and a new term relationshipOfResourceID (https://github.com/tdwg/dwc/issues/186, https://github.com/tdwg/dwc/issues/283) suggest that a clarification should also be made in the associatedOccurrences definition, for clarity. Specifically, the directionality of the relationship is expressed oddly. It should be something more like: "A list (concatenated and separated) of identifiers of other Occurrence records and the associations of this Occurrence to each of them." The term would benefit from a usage note as well, something like, "This term can be used to provide a list of associations to other Occurrences. Note that the ResourceRelationship class is an alternative means of representing associations, and with more detail. Recommended best practice is to separate the values in a list with space vertical bar space ( | )." An update to the examples is also in order. It would be good to add the type of association, and to give another example. I suggest something like: "parasite collected from":"https://arctos.database.museum/guid/MSB:Mamm:215895?seid=950760", "encounter previous to":"http://arctos.database.museum/guid/MSB:Mamm:292063?seid=3175067" | "encounter previous to":"http://arctos.database.museum/guid/MSB:Mamm:292063?seid=3177393" | "encounter previous to":"http://arctos.database.museum/guid/MSB:Mamm:292063?seid=3177394" | "encounter previous to":"http://arctos.database.museum/guid/MSB:Mamm:292063?seid=3177392" | "encounter previous to":"http://arctos.database.museum/guid/MSB:Mamm:292063?seid=3609139"

Similar updates should also be made for other associatedX terms.

baskaufs commented 3 years ago

dwc:associatedOccurrences does not have dwciri: analog, so no revision necessary in that namespace.

hollyel commented 3 years ago

There are additional important use cases for documenting information on paleo specimens/occurrences. In most cases the information that needs to be provided with a term like associatedOccurrences (or using the Resource Relationship Extension) are more detailed than occurrences of a taxa at the same locality and event (also keeping in mind that data about a paleo occurrence is slightly different than what was previously described in this thread for extant collecting). The primary example we're thinking of would be individual specimens occurring on the same slab or specimens that were originally collected as a stratigraphic lot and separated later. Including an example of this type would be beneficial for adoption of this term within paleo data. We can pull those together if adding an additional example is acceptable. For the discussion about the extension, the arguments for using the Resource Relationship extension make a lot of sense. However, there is a limited use of extensions currently within paleo related datasets and it is often challenging for those providers to adopt the use of new extensions. We can continue to work towards that, but it is important to maintain these types of terms (associatedXxx) to enable sharing of this information. -Holly Little, Erica Krimmel (@ekrimmel) and Talia Karim (@tkarim) on behalf of the Paleo Data Working Group)

tucotuco commented 3 years ago

@hollyel Additional examples are highly encouraged. If you can submit them before the release, that would be great, because it would avoid a small incremental update later.

EstebanMH-SiB commented 3 years ago

We endorse this proposal on behalf of @SiBColombia

ljwalker commented 3 years ago

I second the comment made by @hollyel @ekrimmel @tkarim.

In addition to their "slab" example--which is a very common scenario in paleontological collections--is that of trace fossils, whereby the occurrence of a trace maker (e.g. a predation-related scar or boring) is physically associated with another occurrence (e.g. a bivalve or gastropod).

tucotuco commented 3 years ago

Done.