Define the process for handling project metadata for projects held remotely

jdpye commented 1 year ago

A remote project is one that is updated and housed on another database / system but that will update periodically during metadata harvesting from either ETN or OTN. We need to harvest, store, and express this metadata in such a way that:

[ ] its source node or database is obvious to an end-user
[ ] it does not confuse the harvester into thinking it's a locally held project and is re-harvested into its own host node
[ ] information reported by researchers as different projects to multiple different nodes or databases can be identified and disambiguated

There are a few philosophy/process level steps to take first.

[ ] Codify our shared management of multiple-membership researchers and projects and define a process for disambiguation of cross-reported or double-reported activities. Establish and publish a co-authored document on what the policy is, and what the workflow will be to disambiguate and ensure that a primary and secondary source can be established for each double-reported activity. (this may help alleviate some of the questions that pop up around reporting)

jdpye commented 12 months ago

OTN data policy: https://members.oceantrack.org/data/policies/otn-data-policy-2018.pdf

ETN data policy: https://www.lifewatch.be/etn/assets/docs/ETN-DataPolicy.pdf

In preparation for a discussion / document drafting, we will see if there are any blockers to treating ETN in a similar way to a Node (so that ETN can be interfaced with as if it were a standard 'Primary' database for Eurocentric projects homed at ETN, and that we could push data to it as a 'client' database for projects that are homed at OTN). If there are no policy blockers we can continue to draw distinction between what happens to an OTN-as-primary project and an ETN-as-primary project, and what can be harvested between them.

jdpye commented 10 months ago

From my review of the ETN data policy with an eye to interoperation with OTN and Nodes, mostly our language around Restricted and Unrestricted data are congruent.

Some questions persist, so here's what I have so far for @CLAUMEMO @jreubens to give me some more clarity on how things might operate.

Formal Data Owners: this term is only referenced once, to define 'Data Owners'. If OTN is sharing data to ETN from researchers that are held and updated by those researchers in the OTN database, who are the 'Formal Data Owners' in that case? Who are the Data Owners?

If Data Owners withdraw detection data from the ETN database that has been matched to another project's tags, what happens to that detection data?

For reporting data to OBIS/GBIF, what schema/mechanism are you using to translate raw detection data into DwC Event or Occurrence Core (ideally it's the one recently built into the etn R package by @jonasmortelmansvliz and @peterdesmet ? https://github.com/inbo/etn/blob/main/R/write_dwc.R ) OTN co-operated in the development of the archive-building standard (currently hosted in the etn R package) and would like to ensure harmony between archives we make and archives built by ETN. These archives are tag-oriented and so a project is nicely self contained but can still contribute detections to other projects incidentally as animals tagged by other projects enter the project's study area.

It isn't clearly stated either way in the ETN Data Policy, but we have discussed it and we will also need to formally agree between our networks that, as is the case with OTN and its family of Nodes, only the Primary Database (i.e. where the data is updated and extended directly by the data collectors) should initiate the publishing of archives to OBIS. This is to prevent duplication and ensure updates and corrections can flow quickly into the public record. Whether they flow through the national/regional OBIS nodes or through OTN's thematic node, matters far less to me than that they are congruent with one another, unduplicated, and an end-user can make sense of them if they are aggregating them together.

This leaves us to work out what happens when we match detections across the networks. Currently, because we have overlapping reporting requirements for some of our projects, we may arrive at a place where detections are matched on one network and those matches would need to be propagated to the other network, to preclude attempting to match them again. This will be a harder problem to outline but still possible since we are in such good contact and understand how to express data to one another on the technical level.

lifewatch / etn-otn-exchange

Define the process for handling project metadata for projects held remotely #24