tdwg / dwc-for-biologging

Darwin Core recommendations for biologging data
Creative Commons Attribution 4.0 International
13 stars 3 forks source link

Do not use parentEventID if not necessary #8

Open peterdesmet opened 5 years ago

peterdesmet commented 5 years ago

In "Mahoney-data-DwC-A-test-2" I noticed that parentEventID is populated with the animalID:

F53
IdCoy_P3_1

I would not do this:

  1. Records with that ID cannot be found in the event core (I don't think we should create them either)
  2. Events are already clearly grouped in how their eventID is written, e.g. F53:capture1
  3. I would really avoid using a hierarchical structure of events if we can help it
pieterprovoost commented 5 years ago

I think there needs to be a way to group events per animal which does not involve parsing character delimited strings. Also, I think encoding hierarchy in strings makes it harder to check referential integrity. eventID F35:capture1 (did you notice the typo?) will not trigger any alarms, but the fact that parentEventID F35 refers to a non existing eventID exist might.

Of course any parentEventID needs to have a matching eventID.

peggynewman commented 5 years ago

@peterdesmet I'm interested to understand why you'd like to avoid hierarchical events? They seem to offer a lot of flexibility. I get that they might be difficult to interpret/ingest from system to system.

sarahcd commented 4 years ago

I deleted parentEventID from my event table. So now, the only place where unique animals are identified in organismName in the occurrence table. But I agree with @pieterprovoost's point: I think something is missing here and the event table (the closest to a 'summary' table) needs to define unique individuals somewhere. Otherwise it is more difficult to check the data or compile it into a data frame, and there is a good chance of confusion, e.g. that deployments get confused with individuals, or the user doesn't notice that multiple records are about the same individual. I have a similar concern with FOM records that don't include a unique animal identifier and have no associated occurrenceID (measurements not taken at the same time as a GPS fix). However I don't see any other good place to define individuals in the event table. Happy for any ideas!

peggynewman commented 4 years ago

Using string parsing to define relations gives me ER nightmares. But complicated hierarchies won't be universally handled well. I think we should use parentEventID to tie together all of the events and occurrence records, but make a recommendation to use it for a simple parent-child relationship with no further levels. The only other way to tie @sarahcd 's acceleration-x MoF back to the organism is to pick an occurrence against the same deployment event. Then the selection is tricky - random? max? min?

sarahcd commented 4 years ago

IMO matching measurements to occurrences is not a good solution. (1) There are bio-logging datasets with no occurrences at all except the capture events (e.g. datasets of light level, conductivity and temperature). (2) I really doubt there is one good method for doing this, it will depend on sensor sampling schedules, species/habitat and analysis question. We are unnecessarily processing the data in ways that are not necessarily biologically meaningful and might confuse interpretation.

For now I'll add parentEventID back in.

albenson-usgs commented 4 years ago

I'm wondering if looking at this from different user points of view might help? Can we think of some different users and work back from there / make sure the data will be presented to them in a way that's most useful?

User 1: General GBIF/OBIS user. Just wants to know where individuals of a species occur in space. Most important that they understand all of these occurrences are the same individual. User 2: Interested in assessing animal movements. Needs to be able to parse deployments from individuals. What else?

Are there other users we can think of? Who will be using the acceleration data?

pieterprovoost commented 4 years ago

Would this be a suitable structure? I assume that if anyone needs the acceleration data they can do the matching themselves? I still think a single parent event per organism makes it easier to select all data for a single organism (given that the database at hand is set up properly).

biologging

peterdesmet commented 4 years ago

What @pieterprovoost suggests is probably the best way, but it feels like trying to fit a square peg in a round hole. "Organism" is a concept in Darwin Core, it's just not a "core" file now. Using the Event Core concept has the advantage that GBIF/OBIS can currently handle that data, but it might be good to do the exercise in how we would express biologging data - reusing Darwin Core terms - if we had more freedom in how to structure it.

/cc @timrobertson100