tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
205 stars 70 forks source link

siteID #126

Closed kbraak closed 3 years ago

kbraak commented 8 years ago

Justification

Required for identifying a specific sampling site at or within a location. This allows distinction from locationID, which is an identifier for the sampling location itself. Especially useful for describing permanent sampling sites that haven’t been georeferenced. This also enables sampling sites to be referenced across datasets.

Definition

An identifier for the sampling site at or within the Location.

Comment

May be a global unique identifier or an identifier specific to the dataset. Example: “tr011-s1” identifies a sampling site within a location (locationID= “tr011”).

Term group

Location

baskaufs commented 8 years ago

It isn't clear to me how this is actually different than dwc:locationID. The proposal seems to assume that the "site" is specified to a greater resolution than the "location". But there doesn't seem to be any particular resolution assumed for a location, or to be any reason why a more precisely specified location could not be nested within a broader location.

Also, is it assumed that the things being sampled at the sampling site are organisms? dwc:MaterialSample instances could include organisms as well as non-living materials.

dagendresen commented 8 years ago

Perhaps the issue addressed by siteID is more related to the problems of describing different types of locations with the commonly denormalised use of Darwin Core, e.g. with the core types in the GBIF IPT. Perhaps a better option might be to add a location extension to the IPT application profile registry, to allow for different types of locations to be described?

umeldt commented 8 years ago

In quite a few of the event based datasets we have published or worked with, we are provided with "human readable" names for the sampling location, and it's easy for these to "get lost" in the move towards globally unique identifiers. So yes, while I quite like the idea of a siteID, I'd like it even better if it were paired with something like a siteCode (similar to the institutionCode/institutionID or collectionCode/collectionID).

(I suppose ideally the human readable names (the "codes") would be retrievable through resolvable ids, but we're not quite there yet)

mdoering commented 8 years ago

I would also like to better understand why locationID and locality is not sufficient as these can be of any precision and have been used for a long time to identify collecting/sampling "sites". Is there an example that illustrates why locationID and siteID should exist side by side?

dagendresen commented 8 years ago

If there is a need to nest locations (or sites) with finer resolution within a location, it could obviously also be possible to imagine a need for nesting "sites" with yet finer resolutions within these "sites". Perhaps dwc:higherGeographyID could be used to nest locations - however, I believe that this term might have another meaning? Perhaps a new term "parentLocationID" might be useful?

For sampling data reported in Norway I second Christian that these are often assigned a human readable and rather short code. We often map these location or sampling plot codes (that are reported with most sampling data sets) as dwc:fieldNumber - however, these named plots might actually more appropriately be understood as a "locationCode" and not as a "eventCode" (the same plot codes are used again between multiple sampling events at different weeks, months or years).

It might thus be possible to argue for the following new location terms?

parentLocationID (for building a hierarchy of locations within locations) locationCode (human readable location identifiers not intended to resolve) locationType (to be recommended to follow a controlled value vocabulary)

And that the location terms be organized in a new extension and a new star-schema-"leaf" for the Darwin Core archives publishable in IPT.

kbraak commented 8 years ago

Thanks to everyone for the feedback on this proposal.

I agree it will help to look at an example to illustrate the need for this term.

Consider the recent attempt to digitise legacy data from this study, which investigated the effects of grazing on vegetation using 6 paired 4 m2 monitoring plots of permanently-grazed versus ungrazed plots arranged in 3 random blocks.

All that is known about the location of each monitoring plot (site), is that it is within the following sampling location:

Game reserve ‘Wildgehege Glauer Tal’, which is located on a former military training area 45 km SW of Berlin, NE Germany (13 8'47–13 10'07 E, 52 13'10-52 13'59 N, altitude 37–51 m)

The resolution of this sampling location can be defined using footprintWKT and locality, but doing so reserves locationID for this sampling location, not for the monitoring plots of a greater resolution. siteID would allow identifying the finer resolution location of the monitoring plot where sampling actually took place. If the monitoring plot was georeferenced, siteID and locationID would be the same. From my interpretation it is fine to store a human readable code in locationID as long as it is an "identifier specific to the data set". Of course it would be preferable to have a resolvable GUID for locationIDs, but this is most often not the case.

Indeed it would be cleaner to list all locations in the dataset in an extension. This would allow a hierarchy of locations to be built and allow the type of each location to be defined. Based on my interpretation, dwc:higherGeographyID can only be used to store the broader "geographic region within which the Location occurred". Therefore I agree a new terms "parentLocationID" and "locationType" would be useful.

tucotuco commented 4 years ago

A siteID separate from the locationID is not necessary. In an Event Core, the study site are can have Events using the locationID and Location descriptors for the study site, while child Event records for each monitoring plot (site) can have their own locationIDs and refer to the study site through parentEventID.

I propose to close this request.

tucotuco commented 3 years ago

The capability sought by the addition of this term is already available. No further demand has been demonstrated. Closing.