mapping-commons / sssom

Simple Standard for Sharing Ontology Mappings
https://mapping-commons.github.io/sssom/
BSD 3-Clause "New" or "Revised" License
143 stars 24 forks source link

Create better analysis of related work #26

Open matentzn opened 4 years ago

matentzn commented 4 years ago

This issue is to just dump some vaguely related work in;

Chris-Evelo commented 4 years ago

The BridgeDb project does related work on database ID mappings and mapping services. There too we need mappings and their provenance and mapping tools.

cmungall commented 4 years ago

Thanks @Chris-Evelo - do you have a link to the schema used for mappings in brridgedb, or the serialization format? cc @realmarcin

matentzn commented 4 years ago

Another important source, shared by @AlasdairGray: http://www.openphacts.org/specs/2013/WD-datadesc-20130912/

@cmungall you can probably derive the bridge-db schema from Section A:1..

matentzn commented 4 years ago

https://www.omg.org/spec/CTS2

jhpoelen commented 4 years ago

re: email thread related to contextual mappings with @matentzn @cmungall - Preston https://github.com/bio-guoda/preston, a biodiversity dataset tracker, (disclaimer: I am a contributor) uses Prov-O https://www.w3.org/TR/prov-o/ to describe the context in which datasets are tracked. See also https://doi.org/10.1016/j.ecoinf.2020.101132 . Preston maps URLs to unique content id/hashes to track published dataset versions.

matentzn commented 4 years ago

Thanks @jhpoelen, this also relates to https://github.com/OBOFoundry/SSSOM/issues/3

We will make alignment with PROV-O a priority once we have the basic syntactic/scope questions finalised! Thanks for sharing the references!

Chris-Evelo commented 4 years ago

Yes, @cmungall and @matentzn I think the Open PHACTS IMS version of BridgeDb indeed had most work done on the inclusion of provenance information (in the VoID header) and it allowed changes of predicates in the linksets. However in the scientific lenses approach (see this pdf and these slides ) approach ( we simply exchanged complete linksets, so in that for that purpose having the provenance and the type of mapping relationship on a set level is not even bad. For the traditional version of BridgeDb and the API we use Derby databases. But I think it would be nice to have a transport format that could be based on the IMS linksets and what we discuss here.

matentzn commented 2 years ago

https://www.omg.org/spec/DOL/1.0/PDF

Just mentioned by someone in @cmungall key note.

matentzn commented 2 years ago

SKOS play mentioned by @graybeal :

Analogous to frontmatter format, I keep being drawn to the SKOS Play format as an alternate (but I think TTL-compatible) format for the SSSOM content. How bad would that be? (I can create a ticket)

matentzn commented 2 years ago

Look at HL7 implementation profiles as a way to incorporate an approach to this complex mapping challenge

matentzn commented 2 years ago

Look at RELMA and how loinc does their mappings: https://loinc.org/mappings/

matentzn commented 2 years ago

Check out, also http://build.fhir.org/conceptmap2.html

matentzn commented 2 years ago

Check also SILK and its "Link Specification Language": https://app.assembla.com/wiki/show/silk/Link_Specification_Language

matentzn commented 1 year ago

https://inter-iot.readthedocs.io/projects/ipsm/en/latest/Configuration/Alignment-format/IPSM-alignment-format/

matentzn commented 1 year ago

see also #250

matentzn commented 1 year ago

Another one from @nichtich, see https://github.com/mapping-commons/sssom/discussions/197#discussioncomment-4694842: https://reconciliation-api.github.io/specs/latest/

nichtich commented 1 year ago

Are there any other tabular formats with header section? I know Markdown with YAML header (using --- to separate YAML header from content) and the DC Tabular Application Profile group discussed use of header in tabular data but decided to put additional information such as namespace prefixes in separate documents (DCTAP has some more similarities to SSSOM Tables but different application). I've also seen CSV with multi-line header rows but the number of header lines is known in advance. Do other tabular formats use # as header indicator?

graybeal commented 1 year ago

Oh yeah! The Excel-to-SKOS converter format in SKOSPlay is outstanding, and also can convert Excel to OWL if you do things right. A colleague and I have created a whole automated GitHub-based pipeline excel2rdf, and a companion one based on Google docs (sheet2rdf). I can not promote the quality of the SKOS Play software highly enough, I consider it a beautiful piece of work. (comparable to SSSOM, if I can be so bold!)

Here is an example Google sheet (slightly overcolored!) to illustrate a detailed self-documenting example.

(I've tried to convince SSSOM to make their format SKOSPlay-compatible, it's 95% there and it would make transformation to RDF trivial. But for some reason no one besides me gets excited about that. :-( )

matentzn commented 1 year ago

I think it is because no one in our world uses SKOS Play :) Building a transformer should be easy enough!

graybeal commented 1 year ago

Spreadsheet is now viewable, sorry!

matentzn commented 1 year ago

I can see the appeal for such a format for human reviewers/curators, but its not very easily machine processable by standard data science tools. However, it should not be hard to write a converter. In this case, I do prefer the current design in SSSOM, despite the obvious advantages for human readbility of the table you shared!

Chris-Evelo commented 1 year ago

So maybe a converter in the other direction would be a good thing? In that way, you would always have the SSSOM design which is nicely machine-readable, but for inspection, you could create a more human-friendly format.