nfdi-de / section-metadata-wg-onto

This reposotory is used to document the work of the NFDI Section (Meta)Data Working Group on Ontology Harmonization and Mapping.
https://www.nfdi.de/section-metadata/
9 stars 0 forks source link

knowledge sharing approach: CDIF (Cross Domain Interoperability Framework) #14

Open nbetancort opened 8 months ago

nbetancort commented 8 months ago

This issue is associated with the charter epic #4 and #7

Which mapping tool or framework do you want to discuss?

https://worldfair-project.eu/cross-domain-interoperability-framework/ Overview: https://doi.org/10.5446/66247#t=20:44 or https://vimeo.com/991198957 (Practical Guidelines for FAIR Interoperability, 2024-07-25) Deliverable Report: https://doi.org/10.5281/zenodo.11236871

Why do you think this mapping tool or framework is relevant in the context of our working group?

They work in the context of the WorldFAIR project, that sets out to produce recommendations, interoperability frameworks and guidelines for FAIR data assessment. Their approach is not to create mappings from every existing domain to another, because that is an infinite task, but to create a lingua franca that we can all agree on and communicate with.

What further steps are needed to be taken or discussed by/in our WG regarding this issue?

  1. [x] Start evaluation when more concrete specifications, recommendations or tutorials has been published: Gregory, A. et al. (2024). WorldFAIR (D2.3) Cross-Domain Interoperability Framework (CDIF) (Report Synthesising Recommendations for Disciplines and Cross-Disciplinary Research Areas) (Version 1). Zenodo. https://doi.org/10.5281/zenodo.11236871
  2. [ ] Check and agree internally which parts of CDIF we want to implement and how, as in which areas CDIF can or cannot cover and how deep into details it can be applied.
  3. [ ] In case of positive agreement, discuss cooperation ways with the WorldFAIR team (for example, someone from our WG also being part of the CDIF WG)
StroemPhi commented 8 months ago

I believe evaluating CDIF is also part of epic #4.

Although I do agree that the CDIF approach makes total sense, given the fact that it is not yet mature enough, I suggest postponing its evaluation until more concrete specifications, recommendations or tutorials/how-tos are published. Currently, I feel like: "Yes it sounds all nice in theory, but how does it look/work in reality?"

So the best case would be to have someone from our WG also be part of the CDIF WG, who can regularly brief us on updates.

hgoerzig commented 6 months ago

CDIF is a related set of guidelines for providing metadata in a domain-neutral manner. CDIF will provide detailed recommendations for the use of specific standards (Schema.org, DCAT, ODRL, DDI-CDI, SKOS/XKOS, SSSOM, etc.) The usage of the standards is e.g.: • Understand data structure (DDI-CDI) • Understand semantics (SKOS/XKOS, OWL, SSSOM) • Determine origination/context (PROV-O, I-ADOPT/O&M)

It has been applied for the integration of climate data from Copernicus ERA5 and air quality data from the European Environmental Agency (EEA) with data from the European Social Survey (ESS) Integrated Data in ESS Lab. There might be more fields where it has been applied but I don't know about them.

Anyway, the CDIF approach has not been thoroughly enough tested to be recommended. Therefore, volunteers to do so are welcome. This means that it needs to be applied in different science domains. e.g. X-ray Absorption Spectroscopy (XAS) has developed its own de-facto standards, which are not used outside of that community and on different levels of the processing pipeline. Does CDIF work in this area to have a clear mechanism to integrate and use data from different sources with different standards to make XAS data easier discoverable, and to enable exchange it for processing and integration?

In general we need to check which areas CDIF can or cannot cover and how deep into details it can be applied.

nbetancort commented 6 months ago

Anyway, the CDIF approach has not been thoroughly enough tested to be recommended. Therefore, volunteers to do so are welcome. This means that it needs to be applied in different science domains. e.g. X-ray Absorption Spectroscopy (XAS) has developed its own de-facto standards, which are not used outside of that community and on different levels of the processing pipeline. Does CDIF work in this area to have a clear mechanism to integrate and use data from different sources with different standards to make XAS data easier discoverable, and to enable exchange it for processing and integration?

In general we need to check which areas CDIF can or cannot cover and how deep into details it can be applied.

That's very interesting - another example where it could be helpful is in the case of qualitative data (of different types, from different domains).

I think that testing and creating practical guidelines on how to use domain-agnostic standards is very important for interoperability and data exchange purposes, because the use of these powerful and flexible standards (like DDI-CDI, which we are testing for qualitative data https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/3083862017/Qualitative+Data+Subgroup) could be applied in so many ways and for so many use cases that people might need guidelines and support on how to apply them, leading to harmonized use, which in turn leads to interoperability.

So I agree that we need volunteers to bring in different use cases and applications!

StroemPhi commented 6 months ago

In the context of our 8-5-2024 call where we talked about using DCAT/-AP, I would be interested in finding out how CDIF is planning on "extending" DCAT to attach domain-specific RDF. (cc @hgoerzig)

StroemPhi commented 5 months ago

CDIF was presented in our regular call on June 12th 2024. The slides, notes and recording are linked in its agenda.

nbetancort commented 3 months ago

A webinar took place on Thu 25 July and covered the following topics:

Recording or presentations: https://vimeo.com/991198957?share=copy Presentations

Book: https://cross-domain-interoperability-framework.github.io/cdifbook/introduction.html

dalito commented 3 months ago

In the context of our 8-5-2024 call where we talked about using DCAT/-AP, I would be interested in finding out how CDIF is planning on "extending" DCAT to attach domain-specific RDF.

I did not see an answer to this in the DCAT chapter of the book. Did they address this in the webinar? I could only attend the first 45 min.

nbetancort commented 3 months ago

I did not see an answer to this in the DCAT chapter of the book. Did they address this in the webinar? I could only attend the first 45 min.

No, it wasn't addressed, it was more a presentation of general implementation examples and approaches (discovery and integration) while referring to the documentation (for example the book where the profiles are explained and metadata examples are linked: https://cross-domain-interoperability-framework.github.io/cdifbook/examples/index.html and the link to the specific profiles which will be published in the near future https://worldfair-project.eu/cross-domain-interoperability-framework/)

Maybe I got wrong the question from 8-5-2024 call because I wasn't there at that day, but don't the implementation table and the profiles address that?

Some notes that I took about more domain-specific information to be addressed by the profiles (in the case of DCAT, rather to facilitate discovery on the web, and not to describe the data in its more granular level - for that the Data Integration Profile should be used) from the examples: Geologic Time Scale in Abstract says The Geological Timescale Model is aligned with the W3C OWL-Time ontology https://www.w3.org/TR/owl-time/ for the temporal topology, with OGC GeoSPARQL http://www.opengeospatial.org/standards/geosparql for location data, and with the W3C SOSA/SSN ontology for samples. The content of the vocabulary matches the 2017-02 International Chronostratigraphic Chart

Then this relationship is made explicit by

  dcat:qualifiedRelation [
      rdf:type dcat:Relationship ;
      dcterms:relation <http://resource.geosciml.org/classifier/ics/ischart/> ;
      dcterms:title "OWL ontology defining the boundaries and intervals in the geologic time scale" ;
      dcat:hadRole <http://id.loc.gov/vocabulary/relationship/datasource> ;
    ] ;

I would add to it the mention of the W3C OWL-Time ontology and OGC GeoSPARQL from the abstract in a dcterms:conformsTo property

StroemPhi commented 3 months ago

I've skimmed through the respective CDIF html book sections and looked at most of the examples. From this, I understand, that currently, we can only use dcat:theme and dcat:keyword to very loosely attach metadata that describes a dataset in terms of its content and we have the DDI-CDI approach limited to CSV. The latter, to me, seems to be a bit more complicated than using some RDF/OWL object property to link to a harmonized content-metadata knowledge graph.

nbetancort commented 3 months ago

DDI-CDI can also express wide, long, multidimensional, and key-value pairs, so it deals with the structure of the data, but also with its semantics. So it should be more complicated, but it is also more powerful and allows the expression of domain semantics. This is because the purpose of the standard (and the CDIF functionality it pretends to serve), integration, needs to cover many aspects to make data understandable for its integration into other systems or with other data. I guess they will also provide tools to describe data more easily. We are considering this for non-numerical data as well.

From the CDIF book I quote: The concept definitions that specify the semantics must be separated from the structural description of the data for a useful cross-domain data description scheme, along with an indication of where the semantics for both the field and the values come from. CDIF recommends a subset of the classes in the DDI-CDI specification for data description.

This is also where I see the difference with I-Adopt (if I understand I-ADOPT correctly): it only describes the characteristic measured or variable, but not the values, neither structure nor provenance.

DCAT is meant to be used for discovery by CDIF, so I agree with you that the semantics there are loosely handled by the theme and keyword properties.

nbetancort commented 3 months ago

In our 2024-07-23 call we agreed we should provide use cases from different consortia and present them in CDIF.

There are no templates for this, but the documentation: Gregory, A., Bell, D., Brickley, D., Buttigieg, P. L., Cox, S., Edwards, M., Doug, F., Gonzalez Morales, L. G., Heus, P., Hodson, S., Kanjala, C., Le Franc, Y., Maxwell, L., Molloy, L., Richard, S., Rizzolo, F., Winstanley, P., Wyborn, L., & Burton, A. (2024). WorldFAIR (D2.3) Cross-Domain Interoperability Framework (CDIF) (Report Synthesising Recommendations for Disciplines and Cross-Disciplinary Research Areas) (Version 1). Zenodo. https://doi.org/10.5281/zenodo.11236871

and the book (see the examples), could be followed to describe our use cases.

Since this will probably be done manually, I suggest working on a spreadsheet with the set of fields suggested in each profile?

But first we should discuss in one of our next calls if we should focus on specific profiles (those related to our Mapping/Harmonization topic):

Users only need adopt those profiles that are useful to them. There is no requirement for the adoption of optional profile content. For example, it is possible to describe data to make it ‘integration ready’ at a detailed level, but not to support profiles for data discovery or access, to give but one example. CDIF profiles are intended to be a toolkit for implementation, with the needed functions being addressed in any specific setting according to implementer priorities.

dalito commented 3 months ago

I just checked if IDs resolve https://ddialliance.org/Specification/DDI-CDI/1.0/RDF/Activity/ - but they don't (This is from https://bitbucket.org/ddi-cdi-resources/ddi-cdi/src/master/build/encoding/ontology/Process.onto.ttl). It may just be an oversight but (for me) raises questions on the TRL.

nbetancort commented 3 months ago

Not yet officially released

dalito commented 3 months ago

Ah, thanks for the info! I may report the issue later in their repo but have no access to my Bibucket-account now.

StroemPhi commented 3 months ago

Thank you @nbetancort, I put this on the agenda for next week to be discussed.

dalito commented 2 months ago

It seems that DDI alliance is slowly moving everything to GitHub. There is also https://ddialliance.github.io/ddimodel-web/ where I could find different representations of the DDI-CDI model. The model is formulated in XMI (UML-exchange format) as primary representation. In a gh-action pipeline it is automatically converted to json-schema and various RDF-outputs (OWL, SHACL, SHEX), see e.g. https://github.com/ddialliance/ddimodel/actions/runs/10402271583 (this link will expire since GH deletes jobs after a while)

The IDs were also resolving when I checked again last week. Progress! :smile:

@nbetancort - I agree with your understanding of I-ADOPT vs. DDI-CDI above. The variable model in I-ADOPT is very flexible which may lead to different decisions on how to represent the same variable (see also this comment). DDI-CDI is providing a somewhat tighter framework with their "variable cascade".

dalito commented 1 day ago

FYI, slides & notes from the DDI-alliance meeting in Oct-2024 Dagstuhl are available https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/3125674192/2024+Aligning+Technology+Architectures+with+Cross-Domain+Metadata+Models (see also sub-pages of that page)

dalito commented 1 hour ago

DDI-CDI moved from RC3 state to final (2024-11-22): https://ddialliance.org/announcement/ddi-cdi-version-10-vote-results-approved - However, the Version-1.0 build artifacts are not yet available at the DDI-CDI specification page.