iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.oceaninfohub.org/
26 stars 16 forks source link

How to handle record shared through multiple partners #372

Open jmckenna opened 7 months ago

jmckenna commented 7 months ago

related to https://github.com/iodepo/odis-arch/issues/257

pbuttigieg commented 7 months ago

xref https://github.com/iodepo/oih-ui/issues/54

pbuttigieg commented 7 months ago

We'd keep them both, as they wouldn't be completely identical (different providers) and then group them in the front-end as near duplicates. Do they use the same identifiers'?

JoBeja commented 7 months ago

Hi all, Do note that this should be the exact same dataset, the provider is different, but the originator is always the same-> CEFAS. The dataset in OBIS is also in the EMODnet catalogue as it's published by EMODnet Biology and harvested by OBIS via the EurOBIS IPT instance. The data in EMODnet Biology and OBIS are in DwC, through MEDIN, there is no data link, the metadata record re-directs users to the CEFAS data portal where the original data can be downloaded in csv or shapefile. If you're harvesting the EMODnet catalogue, how is it that this dataset doesn't also appear listed as being provided by EMODnet (Biology)? (I'm curious why it doesn't show up). The record in the EMODnet catalogue can be found via the link https://emodnet.ec.europa.eu/geonetwork/srv/eng/catalog.search#/metadata/2aae49ae-516c-4b26-a5bf-163a12806505

pbuttigieg commented 7 months ago

Hi @JoBeja - thanks

Do note that this should be the exact same dataset, the provider is different, but the originator is always the same-> CEFAS. The dataset in OBIS is also in the EMODnet catalogue as it's published by EMODnet Biology and harvested by OBIS via the EurOBIS IPT instance.

Yes - we expect that there will be cross-listings. The identifier value should be the same to know this unambiguously.

The data in EMODnet Biology and OBIS are in DwC, through MEDIN, there is no data link, the metadata record re-directs users to the CEFAS data portal where the original data can be downloaded in csv or shapefile.

Interesting, so the records are not exact, but near matches that link off to different distributions. This is valuable.

If you're harvesting the EMODnet catalogue, how is it that this dataset doesn't also appear listed as being provided by EMODnet (Biology)? (I'm curious why it doesn't show up). The record in the EMODnet catalogue can be found via the link https://emodnet.ec.europa.eu/geonetwork/srv/eng/catalog.search#/metadata/2aae49ae-516c-4b26-a5bf-163a12806505

That's a very good question - @fils ? Perhaps @snaggyD knows?

JoBeja commented 7 months ago

Hi, I can't speak for this particular record (i.e. data) without actually looking at the data from CEFAS and what we have published, but generally speaking I would say that there could be slight differences in the data itself. e.g CEFAS (or any other originator that publishes their data on their portal) could have the data with information that were later found to be incorrect through our QC checks and never corrected in their own system. This is an example that might not apply to this case. Another example is that the data from the CEFAS portal don't appear to be in DwC format, whereas the one available in EMODnet Biology (and therefore OBIS) will be. Ah, I see as well that if I choose EMODnet as the provider in the Ocean InfoHub page, only 709 records are available, the EMODnet catalogue currently has >2000 tagged as datasets, so maybe a sync is needed? I would highly recommend it as you will find (today) about 1300 duplicates with the OBIS catalogue alone.