iobis / gbif-marine

2 stars 1 forks source link

Basis of record? #10

Open hosonot opened 8 years ago

hosonot commented 8 years ago

why is this field required by GBIF? what is the purpose of this information?

dagendresen commented 8 years ago

Darwin Core Occurrence data is normally published in GBIF in a denormalized form. Each Occurrence can be seen as a type of evidence for a species occurrence in nature. Basis of record declare the type of evidence the data record describes.

wardappeltans commented 8 years ago

dear @dagendresen, is there any application/product that will break when basis of record is not filled in? I think that is what our data providers wonder about.

kcopas commented 8 years ago

Basis of record is a required field in the IPT. You can see our recently updated page on data quality requirements for specific details and additional resources: http://www.gbif.org/publishing-data/quality

Kyle Copas Weysesgade 43 2100 Copenhagen Ø (+45) 28 75 14 75 | skype kylecopas

On Thu, Jun 2, 2016 at 11:14 AM, Ward Appeltans notifications@github.com wrote:

dear @dagendresen https://github.com/dagendresen, is there any application/product that will break when basis of record is not filled in? I think that is what our data providers wonder about.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iobis/gbif-marine/issues/10#issuecomment-223238182, or mute the thread https://github.com/notifications/unsubscribe/ALzkZCGGwfME2FT-b-wjSS6BLaoF4SU6ks5qHp7vgaJpZM4IojTU .

wardappeltans commented 8 years ago

ok, so the only application so far that would break is the IPT?

dagendresen commented 8 years ago

basisOfRecord is the only required field in a GBIF Darwin Core archive for core type occurrence, see: http://rs.gbif.org/core/dwc_occurrence_2015-07-02.xml#basisOfRecord

However, I believe that occurrence records with basisOfRecord missing would still be indexed and published in the GBIF portal...

See issues reported for basisOfRecord: http://dev.gbif.org/issues/browse/PF-1394?jql=text%20~%20%22basisOfRecord%22

wardappeltans commented 7 years ago

dear @dagendresen @kcopas

Is this a correct interpretation of the preferred terms to be used in BasisofRecord?

basisOfRecord (required term) specifies the nature of the record, i.e. whether the occurrence record is based on a collected specimen or an observation. In case the specimen is collected, the options are PreservedSpecimen (permanently stored in a collection), FossilSpecimen (which is important to allow OBIS to make the distinction between the date of collection and the time period the specimen was assumed alive) or LivingSpecimen (e.g. a living specimen captivated in an aquarium, botanical gardens or bacteria collections). However, when the occurrence record is an observation, meaning a specimen was not collected or not preserved, the options are: HumanObservation (e.g bird sighting, benthic sample discarded after counting), or MachineObservation (sensors such as DNA, image recognition etc).

ahahn-gbif commented 7 years ago

Thanks, Ward, that indeed is an accurate description.

The term is required because it is essential to distinguish different types of evidence when serving users. It prevents confusion when it comes to documenting occurrences like lions in England (zoos, fossils) or plants outside of their natural distribution range (botanical gardens, plant breeding etc). Having this information supplied in standardized form allows to set filters on data downloads, and to alert users to the fact that data include some content that they may want to exclude for specific types of analyses. At GBIF, the goal is to reach as complete coverage as possible documenting a meaningful basis of record for all occurrence records.

This said, the vocabulary for the basis of record will need to undergo some revision, as it has some known shortcomings (hierarchical groupings and terms coverage).

wardappeltans commented 7 years ago

Dear @ahahn-gbif Thank you! I didn't realize the application of 'LivingSpecimen' refers to a position outside of their natural distribution range. That is an interesting case and useful. So a living specimen (with same materialSampleID) can have 2 records, position of collection (BoR is humanobservation) and position of zoo or resource institute (BoR is livingspecimen).

ahahn-gbif commented 7 years ago

Imprecision on my part - LivingSpecimen does not have to be outside the natural range, it just indicates living + ex situ (intentionally kept/cultivated). This might equally be in a botanical garden documenting the local flora. However, when interpreting coordinates / locations, it is usually important to know whether the organism was put/held in place there intentionally through human intervention (livingSpecimen), or observed as a natural occurrence (humanObservation).

wardappeltans commented 7 years ago

correct. thanks Andrea!

mdoering commented 7 years ago

The correct way of indicating a cultivated occurrence is using dwc:establishmentMeans. LivingSpecimen just documents that the specimen is still alive. For captivated animals and growing plants in botanical gardens this is the case so it is often used to filter out non native records. But that does not have to be the case. For cultivated algae or bacteria this is clearly not the case. And even for a living tree in a botanical garden the given coordinates could resemble the place it was originally collected. It is a grey area and it would be good if the community makes more use of the proper term establishmentMeans

wardappeltans commented 7 years ago

dear @mdoering I think for marine records, LivingSpecimen is quite clear now, meaning living + ex-situ (somewhere on land: such as aquarium or culture collection for marine bacteria...). I think dwc:establishmentMeans is useful for species introductions that have established a population.

mdoering commented 7 years ago

Unfortunately LivingSpecimen does not mean exsitu I am afraid. It often coincides, but it is not part of its definition. It is simply a living specimen as opposed to a PreservedSpecimen. Whether the given event incl its location for such an occurrence is native or exsitu is open in both cases and should be clarified elsewhere - with establishmentMeans being the best option (even its often impossible to tell whether a species occurs natively and what that exactly means).

Whether you kill the fish or keep it alive in your collection does not tell you if the occurrence was ex- or in-situ. It all depends on the event metadata and its location - nothing prevents you from giving the native location for living specimens.

dagendresen commented 7 years ago

The geographic coordinates etc for a (typical) dwc:Occurrence record describes the naturally occurring occurrence of the organism in nature - also if the basisOfRecord is given as LivingSpecimen. As Markus mentions, basisOfRecord = LivingSpecimen only declares that the organism is maintained as a collection-item (specimen) that is maintained alive.

The original occurrence of the organism would be represented by one "Occurrence" record, while (one or many) derived collection-items (specimens) would (each) be described by other "Occurrence" records. Normally, however, the original naturally occurring organism is not declared explicitly (e.g. as basisOfRecord = HumanObservation), but is inferred to exist from a specimen-type "Occurrence" record. Because of this denormalisation with different types of "Occurrences", the basisOfRecord attribute is important to enable the user to understand what type of "Occurrence" is described by the data record at hand.

On another but related topic, I am very interested in the new model proposed by OBIS to organize dwc:MeasurementOrFact data linked to dwc:Event using the new event-core model of the Darwin Core archive format. I am in particular interested in expanding the use of such dwc:Event cores to describe other such Events than the naturally occurring Occurrence of an organism. In agriculture living specimens (living seed samples) are often tested ex situ in field trials for abiotic (frost, heat etc) and biotic (disease resistance etc) traits. The location and time where the field trial is made is important (in particular because of genotype by environment interactions). The location and time would normally be different form the original naturally occurrence of the organism that (once upon a time) was sampled.

I understand however, that the events where OBIS propose to record measurements for the Occurrences are normally (always?) the location of the original natural occurring organism? Or would perhaps OBIS also sometimes make experiments with collected specimens ex situ?

PS. In agrobiodiversity a "LivingSpecimen" can also be conserved (both ex situ and) in situ (e.g. a designated crop wild relative population naturally occurring in a "genetic reserve").

wardappeltans commented 7 years ago

alright Markus @mdoering

@dagendresen, that is why i thought it would be nice if the same materialSampleID can have different coordinates, one for the place of collection and one for the ex-situ experiments. So you can track the specimen. In the case of an ex-situ experiment, the coordinate position would no longer be the original position of collection. I think all this is perfectly possible. The DwC:eventRemark can include the event type info.

dagendresen commented 7 years ago

I think that including different (types of) coordinates for the "same" dwc:Occurrence (same dwc:occurrenceID) would cause (great) confusion. However, creating new and different dwc:Occurrence records for the same organism (specimen) and linking them together using the same dwc:organismID would be quite possible.

However, I was thinking of another approach of declaring a new dwc:Event (for the measurement experiment) and linking the respective dwc:Occurrence records with basisOfRecord = LivingSpecimen to this event.

PS: I see dwc:Occurrence as the evidence of an occurrence - and if you have multiple dwc:MaterialSample specimens they would each normally demand different dwc:occurrenceID identifiers.

wardappeltans commented 7 years ago

dear @dagendresen

I was thinking along these lines:

EventCore eventID | parentEventID | coordinates | eventDate | eventRemarks 0001 | | (in-situ) | 2015-05-05 | collection (in-situ) 0001:A | 0001 | (ex-situ) | 2017-08-30 | OA experiment (ex-situ)

Occurrence Extension eventID | occurrenceID | scientificName | organismID | BoR
0001 | DDDD01 | Genus-species| A01 | LivingSpecimen

Extended MeasurementorFact Extension eventID | occurenceID| MoFType | MoFValue | MoFUnit 0001 | DDDD01 | shell thickness | 2 | mm 0001 | | temperature of the water | 25 | degree celcius 0001:A | | shell thickness | 1.8 | mm 0001:A | | temperature of the water | 30 | degree celcius

Just not sure if it should be organismID or materialSampleID. Note I also did not include the MoFTypeID, MoFValueID and MoFUnitID in this example.

dagendresen commented 7 years ago

The Event is the time and place -- where one (zero?) or many species-occurrences (Occurrence) happen. The Event-core does not include parameters for the organism such as occurrenceID, organismID or materialSampleID. These are described in the Occurrence-extension (or in another Darwin Core archive, or somewhere else with an occurrenceID that resolves).

I believe I was thinking along the same lines as your Event with eventID = 0002, of an Event for an experiment with a LivingSpecimen with measurements made ex situ after the organism has been sampled (and included in a collection).

wardappeltans commented 7 years ago

@dagendresen you are right, I updated the example and moved BoR and organismID to the Occ extension. I also added parentEventID to link the ex-situ event to the in-situ one.

I'm stuck with the ex-situ biometric measurements in eMoF, whether is should be linked with the in-situ OccID or not. Let's think about it further, testing with some concrete examples...(later).

dagendresen commented 7 years ago

See also: https://github.com/tdwg/dwc-qa/issues/61