tdwg / dwc-qa

Public question and answer site for discussions about Darwin Core
Apache License 2.0
49 stars 8 forks source link

Different dynamicProperties fields in event.txt and occurrence.txt - what happens when the tables are joined? #201

Open Mesibov opened 1 year ago

Mesibov commented 1 year ago

A data compiler wants to use dynamicProperties both in event.txt and occurrence.txt. The field in event.txt will have key:value data for events, with each eventID potentially having different data. The field in occurrence.txt will have completely different key:value data, with each occurrenceID potentially having different data. When these tables are joined (e.g. by GBIF, to build occurrences), there is a collision of dynamicProperties fields. What happens? Is there any way to avoid a collision?

tucotuco commented 1 year ago

Hi @Mesibov. I can't speak for how GBIF interprets the two terms (@timrobertson100), but I can say that a more explicit way to encode the information that is going into those dynamicProperties fields is to use the Extended Measurement or Facts extension. With that the publisher can be explicit about whether the information pertains to the Event or to the Occurrence as well as providing more potential richness than a key:value pair.

timrobertson100 commented 1 year ago

In GBIF processing today, the data is pivoted to occurrences such that the fields on the event will only be used if they are null on the occurrence records. In this instance, those event properties would be dropped. There is exploratory work to bring in an event index where both fields would remain, but that is some way out.

I think John provides the better option for current use though.

Mesibov commented 1 year ago

Many thanks @tucotuco and @timrobertson100. So the (single) eMOF could contain records with "eventID" for the event properties and "occurrenceID" for the occurrence properties, which sounds like it would work...

debpaul commented 1 year ago

@Mesibov great question -- thanks for asking it. @mjy please take a look. Thanks @timrobertson100 @tucotuco for explaining the possible choices and results of those choices in this ticket.

dbloom commented 1 year ago

It is a good question @Mesibov, but it begs a related question for me, and I apologize if everyone else knows this already, but when an eMOF is used, do those data actually make it into the GBIF index so that they are searchable? Perhaps more importantly, are these data included in downloads from the GBIF or hosted portals? This is not always the case with data published via extensions. Perhaps @timrobertson100 has the magic answer.

On Mon, Apr 24, 2023 at 7:51 AM Debbie Paul @.***> wrote:

@Mesibov https://github.com/Mesibov great question -- thanks for asking it. @mjy https://github.com/mjy please take a look.

— Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc-qa/issues/201#issuecomment-1520329790, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHGC3YXEBPHNCFEN57LYKLXC2HQHANCNFSM6AAAAAAXFCZRNQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

MattBlissett commented 1 year ago

Some DWCA extensions are included in GBIF, although in most cases their content is not searchable directly. It is possible to search for records having an extension, e.g. https://www.gbif.org/occurrence/search?advanced=1&dwca_extension=http:~2F~2Frs.iobis.org~2Fobis~2Fterms~2FExtendedMeasurementOrFact

Data downloads including verbatim extension data are coming soon.