tdwg / cd

Collection Descriptions
Creative Commons Attribution 4.0 International
23 stars 10 forks source link

Property:derivedCollection #327

Open mswoodburn opened 3 years ago

mswoodburn commented 3 years ago
Label Derived Collection
Definition A flag to indicate that the collection description has been generated by aggregating data from one or more underlying datasets of its individual objects.
Usage
Existing property
Existing class
Existing property identifier
Format Text
Required No
Repeatable No
Constraints Controlled vocabulary
Examples Yes, No
Notes For example this would be set = Yes if the record is an automatically generated reconstruction of Darwin’s finch collection which is now held at multiple institutions.
magpiedin commented 1 year ago

We need clearer examples for this term but not sure what those are: Current candidates:

If we're not sure on those 2 points by March 1, 2023, we'll move this out of version 1

From 26-Jan notes:

derivedCollection: to indicate if the record information can be used for accounting, eg. inventory, or if it will lead to double-counting

  • Recipe required: This is a hard one to explain in the notes/examples of the class - we need to make sure there are some fairly chunky examples in the docs.
  • What do we mean by this class - is it more of ‘this record is formed by the aggregation of object records’ OR ‘this record presents a collection as if it is a single entity, but it is actually held in lots of different places and this record is the product of combining those separate datasets together’
mswoodburn commented 1 year ago

Rooting around a bit, I think this came from (or was inspired by) the original NCD term ncd:derivedCollection (description: A "derived" collection record. The record has been derived from a query on an item-level database e.g. all items from Australia.).

So, referencing the notes above, I think ‘this record is formed by the aggregation of object records’ was the original intention. If we mean it that way, it might be be useful for implying a few things:

  1. that this LtC record is likely to have generated programmatically, so care needs to be taken before manually editing it as it may be overwritten or lose consistency with its source
  2. that there are, somewhere, object-level records that relate to this LtC record (and hopefully are accessible and linked to/from the LtC record)
  3. the LtC record data may represent only the portion of the physical collection to which it relates that has so far been digitised at object-level
  4. the LtC record may be likely to change over time as object-level digitisation progresses and the LtC record is refreshed from the object-level source dataset(s)

In terms of examples, this might be used for LtC records that have been derived from clustering occurrence records in e.g. GBIF, to represent national collections, thematic collections based on taxa, collectors, geographic origin, stratigraphy etc or combinations thereof? Similar uses maybe on CMS data within institutions. There might be an argument to say, why bother with an LtC record when the data can just be aggregated on the fly?, but it's potentially useful for scenarios where we want to share the summary collection data without having to provide all of the source data, or reducing the compute overhead of needing to carry out on-the-fly aggregations from source each time a summary is needed. Also provides a record to attach group-level PIDs and metadata to.

Going back to the notes above, I'm not sure isDerivedCollection has a bearing on double-counting, as the isDistinctObjects property in the CollectionDescriptionScheme class is intended to handle that?

The ‘this record presents a collection as if it is a single entity, but it is actually held in lots of different places and this record is the product of combining those separate datasets together’ use case is also a bit different, I think. Can possibly be handled by linking multiple institutions (OrganisationalUnits) to the same ObjectGroup in an LtC record to show that it's not all physically co-located? There is also the option of having an ObjectGroup for each institution (e.g. 'NHM Darwin Finches', 'FM Darwin Finches', ...) and grouping them under the same CollectionDescriptionScheme, and could also have a top level 'Darwin Finches' ObjectGroup which is related (e.g. derived_from) to the institutional ObjectGroups using the ResourceRelationship class.