lifewatch / etn-otn-exchange

European Tracking Network (ETN) and Ocean Tracking Network (OTN) data exchange issues
4 stars 2 forks source link

Define the process for handling project metadata for projects held remotely #24

Open jdpye opened 1 year ago

jdpye commented 1 year ago

A remote project is one that is updated and housed on another database / system but that will update periodically during metadata harvesting from either ETN or OTN. We need to harvest, store, and express this metadata in such a way that:

There are a few philosophy/process level steps to take first.

jdpye commented 12 months ago

OTN data policy: https://members.oceantrack.org/data/policies/otn-data-policy-2018.pdf

ETN data policy: https://www.lifewatch.be/etn/assets/docs/ETN-DataPolicy.pdf

In preparation for a discussion / document drafting, we will see if there are any blockers to treating ETN in a similar way to a Node (so that ETN can be interfaced with as if it were a standard 'Primary' database for Eurocentric projects homed at ETN, and that we could push data to it as a 'client' database for projects that are homed at OTN). If there are no policy blockers we can continue to draw distinction between what happens to an OTN-as-primary project and an ETN-as-primary project, and what can be harvested between them.

jdpye commented 10 months ago

From my review of the ETN data policy with an eye to interoperation with OTN and Nodes, mostly our language around Restricted and Unrestricted data are congruent.

Some questions persist, so here's what I have so far for @CLAUMEMO @jreubens to give me some more clarity on how things might operate.

Formal Data Owners: this term is only referenced once, to define 'Data Owners'. If OTN is sharing data to ETN from researchers that are held and updated by those researchers in the OTN database, who are the 'Formal Data Owners' in that case? Who are the Data Owners?

If Data Owners withdraw detection data from the ETN database that has been matched to another project's tags, what happens to that detection data?

For reporting data to OBIS/GBIF, what schema/mechanism are you using to translate raw detection data into DwC Event or Occurrence Core (ideally it's the one recently built into the etn R package by @jonasmortelmansvliz and @peterdesmet ? https://github.com/inbo/etn/blob/main/R/write_dwc.R ) OTN co-operated in the development of the archive-building standard (currently hosted in the etn R package) and would like to ensure harmony between archives we make and archives built by ETN. These archives are tag-oriented and so a project is nicely self contained but can still contribute detections to other projects incidentally as animals tagged by other projects enter the project's study area.

It isn't clearly stated either way in the ETN Data Policy, but we have discussed it and we will also need to formally agree between our networks that, as is the case with OTN and its family of Nodes, only the Primary Database (i.e. where the data is updated and extended directly by the data collectors) should initiate the publishing of archives to OBIS. This is to prevent duplication and ensure updates and corrections can flow quickly into the public record. Whether they flow through the national/regional OBIS nodes or through OTN's thematic node, matters far less to me than that they are congruent with one another, unduplicated, and an end-user can make sense of them if they are aggregating them together.

This leaves us to work out what happens when we match detections across the networks. Currently, because we have overlapping reporting requirements for some of our projects, we may arrive at a place where detections are matched on one network and those matches would need to be propagated to the other network, to preclude attempting to match them again. This will be a harder problem to outline but still possible since we are in such good contact and understand how to express data to one another on the technical level.