How do biologging repositories ingest DarwinCore aligned data?

albenson-usgs commented 6 years ago

How will Movebank, OTN, iOBIS, LivingAtlases, etc ingest the data that we are recommending use these guidelines?

jdpye commented 6 years ago

The wish we expressed at the pre meeting:

Movebank accepting a DwC archive in this format for ingestion as-is.

OTN intends to make analysis toolboxes like glatos capable of ingesting these archives as data input, and to produce them for acceptance by iOBIS via the (soon-to-come) OTN IPT.

sarahcd commented 6 years ago

This is a good question. From Movebank's end, I'll be working on this more this fall, some initial impressions:

I see the initial demand to get data FROM Movebank's format TO DwC, rather than the other way around. Our first aim is to make Movebank's publicly archived datasets (datarepository.movebank.org) discoverable in GBIF. Currently our users have and want to work with tabular csv files that combine measurements per timestamp in one row, and few are familiar with DwC.
Automated data import to Movebank typically relies on tabular text data files that always include the same attributes/format/units in the same order (https://www.movebank.org/node/10). As I understand the OBIS-ENV format is considerably more flexible, and I'm not sure how we would read values and map/convert them to Movebank attributes in any automated way. More realistically, an R script could be used to convert an OBIS-ENV DwCA to a tabular file that the user could import into Movebank as a custom CSV.
I'll know more about this once I see more examples of the OBIS-ENV format, but I have a concern that the data volume in this format is going to be extremely high for typical Movebank datasets, which might pose a difficulty for web-based upload and import.

I'd be happy to have a discussion about this!

jdpye commented 6 years ago

I think that the occurrence-core subset of the OBIS-ENV should be able to vocab-map directly to a nice tabular dataset that MoveBank will be happy with. This was the brainchild of our April workshop on the subject, where Peter expressed the desire that the occurrence data be able to stand on its own as a minimal description of animal presence.

Data volume for satellite data will be high, but if we sidebar the non-occurrence datasets the volume should be manageable. It's when we drag in the oceanography, the accelerometry, all the in situ measurements, that we start to spiral out of control volume-wise. And happily, under this format, those go off into EMoF and stay clear of the Occurrence data.

Let me know if there's a chat imminent, I'd love to participate!

albenson-usgs commented 6 years ago

@sarahcd This question actually came up because @peterdesmet would like his seabird tracking data to go into Movebank and is wondering if Movebank will be able to (relatively) seamlessly pull in data that's in Darwin Core but it sounds like based on your second bullet above this may not be possible. Or at least would take some work to figure out. Ideally if we can get Movebank, GBIF, and OBIS all speaking the same language (ie Darwin Core) then all pieces of this type of data become more easily interoperable (I hope!). I wonder if a good next step on this would be for me to work with you Sarah on getting a Movebank dataset into OBIS-ENV-DATA just so you have a clearer idea of what that looks like.

sarahcd commented 6 years ago

I say we plan a meeting where we can screenshare and look at this, after I have some time to look through the feedback on my "draft" OBIS-ENV format dataset that I've already received from the very helpful @albenson-usgs :). I'll send an email to schedule, if anyone else sees this and would like to join let me know.

Antonarctica commented 6 years ago

@sarahcd Happy to join the meeting.

sarahcd commented 6 years ago

@Antonarctica can you tell who you are? ;) I can email you with the specs.

albenson-usgs commented 6 years ago

After the discussion today, we decided for the time being a use case does not currently exist where a an individual data provider would want to align their biologging data to Darwin Core and have it harvested by OBIS/GBIF/Movebank/OTN/Zoatrack. Instead individual data provider will work with a biologging data aggregator (Movebank/OTN/Zoatrack) and then that aggregator would be the one to align the data to Darwin Core and share with OBIS/GBIF.

sarahcd commented 6 years ago

Maybe a more general, related issue to add is how to get multiple DwC archives into a data frame for analysis. This is the end goal for many users and gets at the db-ingestion question but with a more broadly relevant use case.

peggynewman commented 6 years ago

Sounds like an R package to me once we have this nailed!

albenson-usgs commented 5 years ago

@sarahcd If OBIS does indeed integrate biologging data into the system then you should be able to do this using the OBIS API or the robis package. Do I have that right @pieterprovoost?

pieterprovoost commented 5 years ago

@albenson-usgs @sarahcd @peggynewman The robis package will indeed provide access to integrated datasets, but if you want to combine multiple archives without going through the OBIS system you can use https://github.com/ropensci/finch (for reading archives) and https://github.com/iobis/obistools (for merging event trees and occurrences).

tdwg / dwc-for-biologging

How do biologging repositories ingest DarwinCore aligned data? #3