Open matentzn opened 4 years ago
The BridgeDb project does related work on database ID mappings and mapping services. There too we need mappings and their provenance and mapping tools.
Thanks @Chris-Evelo - do you have a link to the schema used for mappings in brridgedb, or the serialization format? cc @realmarcin
Another important source, shared by @AlasdairGray: http://www.openphacts.org/specs/2013/WD-datadesc-20130912/
@cmungall you can probably derive the bridge-db schema from Section A:1..
re: email thread related to contextual mappings with @matentzn @cmungall - Preston https://github.com/bio-guoda/preston, a biodiversity dataset tracker, (disclaimer: I am a contributor) uses Prov-O https://www.w3.org/TR/prov-o/ to describe the context in which datasets are tracked. See also https://doi.org/10.1016/j.ecoinf.2020.101132 . Preston maps URLs to unique content id/hashes to track published dataset versions.
Thanks @jhpoelen, this also relates to https://github.com/OBOFoundry/SSSOM/issues/3
We will make alignment with PROV-O a priority once we have the basic syntactic/scope questions finalised! Thanks for sharing the references!
Yes, @cmungall and @matentzn I think the Open PHACTS IMS version of BridgeDb indeed had most work done on the inclusion of provenance information (in the VoID header) and it allowed changes of predicates in the linksets. However in the scientific lenses approach (see this pdf and these slides ) approach ( we simply exchanged complete linksets, so in that for that purpose having the provenance and the type of mapping relationship on a set level is not even bad. For the traditional version of BridgeDb and the API we use Derby databases. But I think it would be nice to have a transport format that could be based on the IMS linksets and what we discuss here.
https://www.omg.org/spec/DOL/1.0/PDF
Just mentioned by someone in @cmungall key note.
SKOS play mentioned by @graybeal :
Analogous to frontmatter format, I keep being drawn to the SKOS Play format as an alternate (but I think TTL-compatible) format for the SSSOM content. How bad would that be? (I can create a ticket)
Look at HL7 implementation profiles as a way to incorporate an approach to this complex mapping challenge
Look at RELMA and how loinc does their mappings: https://loinc.org/mappings/
Check out, also http://build.fhir.org/conceptmap2.html
Check also SILK and its "Link Specification Language": https://app.assembla.com/wiki/show/silk/Link_Specification_Language
see also #250
Another one from @nichtich, see https://github.com/mapping-commons/sssom/discussions/197#discussioncomment-4694842: https://reconciliation-api.github.io/specs/latest/
Are there any other tabular formats with header section? I know Markdown with YAML header (using ---
to separate YAML header from content) and the DC Tabular Application Profile group discussed use of header in tabular data but decided to put additional information such as namespace prefixes in separate documents (DCTAP has some more similarities to SSSOM Tables but different application). I've also seen CSV with multi-line header rows but the number of header lines is known in advance. Do other tabular formats use #
as header indicator?
Oh yeah! The Excel-to-SKOS converter format in SKOSPlay is outstanding, and also can convert Excel to OWL if you do things right. A colleague and I have created a whole automated GitHub-based pipeline excel2rdf, and a companion one based on Google docs (sheet2rdf). I can not promote the quality of the SKOS Play software highly enough, I consider it a beautiful piece of work. (comparable to SSSOM, if I can be so bold!)
Here is an example Google sheet (slightly overcolored!) to illustrate a detailed self-documenting example.
(I've tried to convince SSSOM to make their format SKOSPlay-compatible, it's 95% there and it would make transformation to RDF trivial. But for some reason no one besides me gets excited about that. :-( )
I think it is because no one in our world uses SKOS Play :) Building a transformer should be easy enough!
Spreadsheet is now viewable, sorry!
I can see the appeal for such a format for human reviewers/curators, but its not very easily machine processable by standard data science tools. However, it should not be hard to write a converter. In this case, I do prefer the current design in SSSOM, despite the obvious advantages for human readbility of the table you shared!
So maybe a converter in the other direction would be a good thing? In that way, you would always have the SSSOM design which is nicely machine-readable, but for inspection, you could create a more human-friendly format.
This issue is to just dump some vaguely related work in;