Open peterdesmet opened 6 years ago
I think there needs to be a way to group events per animal which does not involve parsing character delimited strings. Also, I think encoding hierarchy in strings makes it harder to check referential integrity. eventID
F35:capture1
(did you notice the typo?) will not trigger any alarms, but the fact that parentEventID
F35
refers to a non existing eventID
exist might.
Of course any parentEventID
needs to have a matching eventID
.
@peterdesmet I'm interested to understand why you'd like to avoid hierarchical events? They seem to offer a lot of flexibility. I get that they might be difficult to interpret/ingest from system to system.
I deleted parentEventID from my event table. So now, the only place where unique animals are identified in organismName in the occurrence table. But I agree with @pieterprovoost's point: I think something is missing here and the event table (the closest to a 'summary' table) needs to define unique individuals somewhere. Otherwise it is more difficult to check the data or compile it into a data frame, and there is a good chance of confusion, e.g. that deployments get confused with individuals, or the user doesn't notice that multiple records are about the same individual. I have a similar concern with FOM records that don't include a unique animal identifier and have no associated occurrenceID (measurements not taken at the same time as a GPS fix). However I don't see any other good place to define individuals in the event table. Happy for any ideas!
Using string parsing to define relations gives me ER nightmares. But complicated hierarchies won't be universally handled well. I think we should use parentEventID to tie together all of the events and occurrence records, but make a recommendation to use it for a simple parent-child relationship with no further levels. The only other way to tie @sarahcd 's acceleration-x MoF back to the organism is to pick an occurrence against the same deployment event. Then the selection is tricky - random? max? min?
IMO matching measurements to occurrences is not a good solution. (1) There are bio-logging datasets with no occurrences at all except the capture events (e.g. datasets of light level, conductivity and temperature). (2) I really doubt there is one good method for doing this, it will depend on sensor sampling schedules, species/habitat and analysis question. We are unnecessarily processing the data in ways that are not necessarily biologically meaningful and might confuse interpretation.
For now I'll add parentEventID back in.
I'm wondering if looking at this from different user points of view might help? Can we think of some different users and work back from there / make sure the data will be presented to them in a way that's most useful?
User 1: General GBIF/OBIS user. Just wants to know where individuals of a species occur in space. Most important that they understand all of these occurrences are the same individual. User 2: Interested in assessing animal movements. Needs to be able to parse deployments from individuals. What else?
Are there other users we can think of? Who will be using the acceleration data?
Would this be a suitable structure? I assume that if anyone needs the acceleration data they can do the matching themselves? I still think a single parent event per organism makes it easier to select all data for a single organism (given that the database at hand is set up properly).
What @pieterprovoost suggests is probably the best way, but it feels like trying to fit a square peg in a round hole. "Organism" is a concept in Darwin Core, it's just not a "core" file now. Using the Event Core concept has the advantage that GBIF/OBIS can currently handle that data, but it might be good to do the exercise in how we would express biologging data - reusing Darwin Core terms - if we had more freedom in how to structure it.
/cc @timrobertson100
In "Mahoney-data-DwC-A-test-2" I noticed that
parentEventID
is populated with the animalID:I would not do this:
eventID
is written, e.g.F53:capture1