Closed matentzn closed 3 months ago
Being addressed here #235
@udp is currently in a mad push on OLS, but maybe James, when you get a chance, can you tell me what else needs doing on that PR to call it "good enough" for v1?
cc @rsgoncalves, would also like to hear your input on the PR if you don't mind. Happy to answer any questions if you don't understand what exactly it does!
The PR looks good to me. I think we can swap out our (simple) mapping format with SSSOM straightforwardly. I could use your help clarifying a few things, further below.
For context: In the mapping tool we've been developing, the output is a simple table like so: (subject_id, subject_label, object_id, object_label, mapping_score)
. For example: _("bbj-a-113", "Endometrial cancer", "http://www.ebi.ac.uk/efo/EFO_1001512", "endometrial carcinoma", 0.977010419)_. The tool doesn't output a predicate but it is assumed to be exactMatch— through UIs, curators can change the predicate to 'broad' or 'narrow'.
Now to questions:
In mapping some literals, like trait descriptions in OpenGWAS database records, we want to keep track of the (internal) record identifier associated with each literal. Would it be possible to keep some subject metadata as optional? I understand these identifiers are not of general use, but they're used in internal pipelines and this way we wouldn't have to maintain 2 tables (an internal identifier table + a SSSOM table).
The description of literal_source
states "URI of ontology source for the literal". Seems there's an assumption these literals come from ontologies— is this intended? Do values need to resolve? If not, I imagine this field could be (ab)used to store internal identifiers, for example, but not sure that's the appropriate use.
What is the design difference between confidence
and similarity_score
(when to use one vs the other)? We output a mapping "confidence" score, which is a similarity score computed by one of multiple supported similarity metrics.
So in your use case, you always have an internal identifier? In this case, you could simply be using normal SSSOM rather than the literal profile?
The description of literal_source states "URI of ontology source for the literal". Seems there's an assumption these literals come from ontologies— is this intended? Do values need to resolve? If not, I imagine this field could be (ab)used to store internal identifiers, for example, but not sure that's the appropriate use.
Seems wrong, @udp.
What is the design difference between confidence and similarity_score (when to use one vs the other)? We output a mapping "confidence" score, which is a similarity score computed by one of multiple supported similarity metrics.
Great question, can you open a new issue about that? I will try my best to document the difference, but it is true that these two metrics will often coincide.
No, not always. So far only in a couple of datasets have we had to maintain internal identifiers. I think the literal profile is still the route, with some optional field to specify such identifiers. Could that be literal_source
perhaps? The label seems to suit the intent at least.
I think literal_source
may be confused with being the context (text, database, etc) from which the literal
originated, but it may be an option - I am 50/50 here. However, to avoid marketing issues moving forward: can you articulate a bit more how we can communicate the difference between:
subject_id | subject_label | predicate_id | object_id | object_label | mapping_justification |
---|---|---|---|---|---|
A:1 | label 1 | skos:exactMatch | A:2 | label 2 | semapv:LexicalMatching |
literal_source | literal | predicate_id | object_id | object_label | mapping_justification |
---|---|---|---|---|---|
A:1 | label 1 | skos:exactMatch | A:2 | label 2 | semapv:LexicalMatching |
I see I approved #235 but I actually have a number of problems to raise
mappings
will be extended to be a union of mapping and literal mapping? Or will these be a separate LiteralMappingSet?don't use spaces in class names
I see many models that do this, like https://github.com/biolink/biolink-model/blob/1698cf997785490304a617123d5e3a242c6b2bc0/biolink-model.yaml#L6128. Where can I find focs about this?
I thought this was going to be a separate profile, not go in the main schema
Is there something to read about modular schema development best practices?
Currently this is causing some issues as we are reusing the same slot_uri for different meanings in literal and non literal mappings (some of these can be fixed in linkml but you will run into issues with jsonld)
That was an honest mistake, now fixed. Technically literal mappings are not yet connected to the spec, we just wanted to have the docs out there to be able to use it, even if there is no tool support.
Technically literal mappings are not yet connected to the spec, we just wanted to have the docs out there to be able to use it, even if there is no tool support.
But how is the “literal profile” even supposed to be used? All we have is a literal mapping
class which cannot be contained in a mapping set
(a mapping set can only contains mapping
, not literal mapping
).
I second @cmungall ’s questions:
How are these to be used, exchanged, serialized? How are they collected? Will a MappingSet be extended to allow separate fields for literal mappings and mappings? (and how would this work e.g. with CSV)? Perhaps the range of mappings will be extended to be a union of mapping and literal mapping? Or will these be a separate LiteralMappingSet?
Those questions should get answered before we make a SSSOM 1.0, or the “literal profile” should be removed from the 1.0 version in my opinion.
Right now, the “literal profile” is in effect impossible to implement in code.
I think a separate literal mapping set would be fine? It was never the intention that they would be in the same file.
The use case for this is to publish all of the manually asserted string to term mappings we have collected in ZOOMA, see https://github.com/EBISPOT/zooma2sssom/tree/master/mappings
I don’t see why we even need a separate “profile” or a separate class for literal mappings for such a use case.
Why not simply put the literal in the subject_label
? Along with a new EntityType value for subject_type
that indicates that the subject is a literal and that, therefore, for this particular mapping it is the subject_label
, not the subject_id
, that matters (the subject_id
can even be absent).
subject_label predicate_id object_id mapping_justification mapping_provider subject_type
uterus http://www.w3.org/2000/01/rdf-schema#label http://purl.obolibrary.org/obo/BTO_0001424 https://w3id.org/semapv/vocab/ManualMappingCuration https://www.ebi.ac.uk/vg/faang literal
sperm http://www.w3.org/2000/01/rdf-schema#label http://purl.obolibrary.org/obo/CL_0000019 https://w3id.org/semapv/vocab/ManualMappingCuration https://www.ebi.ac.uk/vg/faang literal
kidney http://www.w3.org/2000/01/rdf-schema#label http://purl.obolibrary.org/obo/BTO_0000671 https://w3id.org/semapv/vocab/ManualMappingCuration https://www.ebi.ac.uk/vg/faang literal
Yes, I think this approach would work if subject_id is made optional.
I think it's helpful to have an SSSOM-like approach for literals and I agree it fits well and doesn't necessarily need a separate "profile" but I wonder if it would lead to significant scope creep. Could SSSOM become a TSV format for annotating information about any kind of subject-predicate-object relationship? The more slots become optional and optional slots exist, the more developers will have trouble implementing tools and users trouble finding a tool that does what they are looking for.
Why does the literal
slot replace the subject_id
slot instead of the object_id
slot? Would literals ever be able to use oboInOwl synonym predicates? I can't see how oio:hasNarrowSynonym
makes sense with a URI as the object. I imagine having the literal as subject works fine with existing mapping predicates (e.g. skos).
I think it's helpful to have an SSSOM-like approach for literals and I agree it fits well and doesn't necessarily need a separate "profile" but I wonder if it would lead to significant scope creep.
Maybe, but it seems there is clear interest in being able to represent such “literal mappings”. So the options are:
A likely outcome of this option is that people who need to handle this case will, in effect, “fork” SSSOM to create their own variant that can represent literal mappings. If several people do that, we will end up in the same situation as we were for general mappings before SSSOM: everyone will represent literal mappings with their own custom format, which will all be slightly incompatible with each other.
Two problems with that approach.
First, for now it is incomplete. The ”literal profile” defines a literal mapping
object, but as @matentzn said, it is currently “not yet connected” to the rest of the spec. As such, it is of little value since developers cannot create tools to deal with such mappings.
Second, even with it is complete, the “literal profile” will be a mess to implement, at least in non-duck-typed languages. There is no relation between mapping
and literal mapping
, so polymorphism won’t help. The “best” solution will be to create a corresponding literal mapping set
class. This will result in a lot of duplicated code, for (in my opinion) very little benefit.
EntityType
enum to indicate than one of the mapped entity is a “literal“ rather than an “entity with an identifier”, and a paragraph somewhere in the spec to explain that when the subject_type
of a mapping is a literal, then subject_label
is mandatory and must contain the literal that is being mapped.I see no obvious drawbacks to that approach, and only benefits. Notably:
a. This allows for either side of the mapping (subject or object) to be the literal. If subject_type
is set to literal
, then the subject is the literal, and the literal value is to be found in subject_label
. If object_type
is set to literal
, then the object is the literal, and the literal value is to be found in object_label
.
b. Consequently, this allows inversion of mappings according to SSSOM’s standard rules (contrary to the profile proposed in #235, where the literal can only be on the subject side).
c. As a side-effect, this even allows for literal-to-literal mappings, should anyone ever need to do that.
d. This allows mixing literal and non-literal mappings, should anyone ever need to do that. Not saying this is necessarily a good idea, but the approach automatically makes it possible without anything special to do. By contrast, the separate fork/profile route would never allow that unless we explicitly plan for this possibility.
e. Implementation-wise, this should be a breeze.
The more slots become optional and optional slots exist, the more developers will have trouble implementing tools and users trouble finding a tool that does what they are looking for.
Apart from subject_id
, predicate_id
, object_id
, and mapping_justification
, all slots are optional. The only thing my proposition would change is that, when checking whether a mapping has a subject_id
(resp. object_id
), an implementation should check beforehand the value of subject_type
(resp. object_type
) – if the value is present and is literal
, then it is subject_label
(resp. object_label
) that should be checked for existence.
I do agree that the fact that most slots are optional can complicate the use of SSSOM, though. This, in fact, is where the notion of “profile” would be interesting, but it would be different from the type of “profile” that has been proposed in #235.
A “profile” could simply be a list of slots that, within the profile, should be considered mandatory. The spec could define a few of such profiles, and users could be free of defining their own.
The idea being that, once you have declared a set to adhere to a given profile (and the parser has verified that the set is indeed compliant with the indicated profile), you no longer have to worry about which slots are present or not because you already know that all slots mandated by your profile are present (if they were not, the parser would have rejected the set outright).
Okay, I understand why an official "fork" for literals is desirable and your argument in 3 for adapting the model by adding literal to the list of possible values for subject_type
and object_type
.
It looks to me like rdfs:Literal
is already an option for these type slots (https://mapping-commons.github.io/sssom/EntityTypeEnum/). Correct me if I read the LinkML documentation wrong.
Has anyone tried mixing different types in the same mappingSet?
On the one hand, it would be really convenient for me to curate both a mapping between entities, and a mapping between an entity and literal in a single TSV file like this (top = literal to owl:Class, bottom = owl:Class to owl:Class):
subject_id | subject_label | predicate_id | predicate_modifier | object_id | object_label | mapping_justification | subject_type | object_type |
---|---|---|---|---|---|---|---|---|
MESH:C535731 | Chmrq1 | oboInOwl:hasExactSynonym | Not | DOID:0070556 | CAMRQ1 | semapv:ManualMappingCuration | literal | linkml:Uriorcurie |
MESH:C535731 | Dysequilibrium syndrome | skos:exactMatch | DOID:0070556 | CAMRQ1 | semapv:ManualMappingCuration | linkml:Uriorcurie | linkml:Uriorcurie |
_I assume subject_id
would be equivalent to literal_source
in https://github.com/mapping-commons/sssom/pull/235._
I can fairly easily tell which is which with just these two mappings, primarily by the predicate I chose to use. But if both mappings used skos:exactMatch
as predicate, which I assume would be in spec, it would require a curator to look at the subject_type
and object_type
slot for every mapping to make sure they are curating the right slots. Having to look that up for every mapping would be much less simple than it is currently, especially when labels become longer and looking at types adds the need for lots of horizontal scrolling.
A “profile” could simply be a list of slots that, within the profile, should be considered mandatory. The spec could define a few of such profiles, and users could be free of defining their own.
I like this idea for a "profile".
It seems fairly straightforward to say in the standard mappings "profile" subject_id
, predicate_id
, object_id
, and mapping_justification
are required, while for the literal "profile" subject_label
would become required and subject_id
optional with no other changes to slots.
Profiles like this could be defined in the mappingSet metadata. Curators could be alerted that everything in a set is of a particular type (or set of allowed types), preventing the confusion I mentioned above. It does lose some of the convenience of creating mappings between very different types in the same file. I suppose you could always define a super "profile" that allows anything from the other defined profiles and then create tools to merge or split profiles.
It looks to me like rdfs:Literal is already an option for these type slots (https://mapping-commons.github.io/sssom/EntityTypeEnum/).
Yes. However it’s unclear to me whether it is suitable here (the poor documentation of the model doesn’t help). Can it be used outside of a RDF context? If I have a list of, say, scRNAseq cell cluster names and I want them to map them to Cell Ontology IDs, would it be correct to use rdfs literal
as the subject_type
even though the subjects are just entries in a flat list and are not part of any RDF graph at all?
Maybe it would be fine, maybe not. I just don’t know. Whoever came up with the values for the EntityType
enum would need to clarify.
Has anyone tried mixing different types in the same mappingSet?
Do you mean, mixing mappings with different subject_type
(or object_type
) values? I never had to do that (all the mappings I have to deal with are mappings between OWL classes), but that’s a completely supported situation. I don’t know for SSSOM-Py, but SSSOM-Java would have no problem whatsoever dealing with such mixed mappings.
Or did you mean, mixing (normal) mapping
s with literal mapping
s (as represented by the new profile)? Then no, right now it’s completely impossible to do that.
On the one hand, it would be really convenient for me to curate both a mapping between entities, and a mapping between an entity and literal in a single TSV file like this (top = literal to owl:Class, bottom = owl:Class to owl:Class):
subject_id subject_label predicate_id predicate_modifier object_id object_label mapping_justification subject_type object_type MESH:C535731 Chmrq1 oboInOwl:hasExactSynonym Not DOID:0070556 CAMRQ1 semapv:ManualMappingCuration literal linkml:Uriorcurie MESH:C535731 Dysequilibrium syndrome skos:exactMatch DOID:0070556 CAMRQ1 semapv:ManualMappingCuration linkml:Uriorcurie linkml:Uriorcurie
I am sorry but I don’t understand your example at all.
The second mapping states that DOID:0070556 is an exact match to MESH:C535731; the first one seems to state that DOID:0070556 is not an exact synonym to MESH:C535731. I don’t understand what is that supposed to mean. Why does MESH:C535731 have a different label in the two mappings? Why is the subject_type
of the first mapping a literal
, while it clearly refers to an entity?
(Besides, linkml:Uriorcurie
is not a valid value for subject_
or object_type
.)
I can fairly easily tell which is which with just these two mappings, primarily by the predicate I chose to use. But if both mappings used skos:exactMatch as predicate, which I assume would be in spec
Again, I don’t understand what you mean here. The spec does not and will not mandate which predicate to use (at most it can recommend that some predicates be used or conversely discourage the use of some others, but that’s it). Just because “literal mappings” would become an officially supported type of mapping does not mean that the spec would force you to use skos:exactMatch
for those mappings.
It seems fairly straightforward to say in the standard mappings "profile" subject_id, predicate_id, object_id, and mapping_justification are required, while for the literal "profile" subject_label would become required and subject_id optional with no other changes to slots.
Something like that, yes.
Profiles like this could be defined in the mappingSet metadata.
I would not envision allowing the definition of a profile in a mapping set’s metadata. Instead, profiles should be defined externally, and a mapping set would simply declare that they use a specific profile. Allowing each mapping set to define its own profile seems like a needless complication to me.
Curators could be alerted that everything in a set is of a particular type
You don’t need profiles to do that. subject_type
and object_type
are propagatable slots, which means that if all mappings in your set have the same subject_type
(resp. object_type
), you can set the subject_type
(resp. object_type
) once and for all in the mapping set’s metadata and the value you set there will apply to all mappings in the set.
I suppose you could always define a super "profile" that allows anything from the other defined profiles
Or you can just not use profiles, if you need to merge several sets that are compliant to different profiles. Profiles, if we ever create them, would not be a mandatory feature – mapping sets would not have to have to a profile.
Sorry for barging in here, I dont have time to comment on here much. Here is my very short take:
Literal mappings and semantic entity mappings are very different things and should not be confused
OK.
We have had lots of meetings about the literal profile and its motivation, discussed it at a workshop, had a PR open for months (or years)
Where is the trace of those “lots of meetings”?
All I know about is:
the discussion in #197, which has not been a particularly active one and which provides very few insights about how you came to the current “design” – in fact, according to that discussion most of the decisions were taken in what seems to have been a private discussion between you and @udp, the minutes of which are not available anywhere;
this #234 issue, which was open at a time the decision to go for a separate “literal profile” had already been taken;
the “2nd Mapping Commons Workship on SSSOM”, which included a presentation by James about the literal profile – again, at a time the decision to use a separate profile for literal mappings had already been taken; the correctness of that decision does not seem to have been discussed following James’ presentation, at least not in the recording that is available. If there has been further discussion outside of the workshop itself, where are the minutes?
it is done
We have a very different opinion of what can be considered “done”. I say it again: the literal profile is right now unusable. There are ways too many questions left open about how it can/should be used.
The only thing the literal profile does for now is causing confusion, by leading people to believe they can use SSSOM to represent literal mappings, which is absolutely not the case.
The EBI has already started to publish “literal mapping sets” (see James’ message above) in .sssom.tsv
files. Anyone could legitimately conclude that those files are bona fide SSSOM/TSV files, and therefore would expect to be able to use them with the existing SSSOM/TSV tools. But those files are not SSSOM/TSV files – no SSSOM tools can deal with them! And no SSSOM tools will ever be able to deal with them.
Aren’t we suppose to care about “interoperability”?
I am very against mixing the concepts of synonyms and mappings.
OK.
I dont not think we should make our work of maintaining SSSOM core model harder by considering what the status is of the profile(s, in the future). Lets just label them as "inoffical", not part of the main standard, for now.
Hard disagree. You’re taking the easy path now without consideration for how hard you will make things in the future. That may be fine for software development in general (“move fast and break things”, as the tech bros of Silicon Valley are saying), but when desiging a (hopefully) long-term standard, you want to move slow and fix things.
Right now, the “literal profile” is a half-assed design that no one knows how to use (not even you apparently). Leaving it like that and kicking the can down the road can only come back to hit us hard in the future.
@goutteg, confusion caused by the example table I shared is exactly the point I was trying to make about mixing literal and entity mappings. The source of both mappings is the same MESH entity, but the top mapping is between the literal in the subject_label
(which was a synonym linked to that MESH term) and states that that literal should NOT be mapped to the specified DOID because it is wrong, while the bottom mapping is between the MESH and DOID entities and states they are exact matches. I could have left out the subject_id
for the top mapping because it is optional but I wanted to be able to tell in the future where this literal came from.
@matentzn, barging in... hahaha. Like you said, you put in work earlier when you had time. I'm sorry I couldn't contribute more at an earlier stage, but I'm fine with leaving literals as unofficial. Can I ask why rdfs:Literal
is an option for subject_type
, predicate_type
and object_type
in SSSOM?
A somewhat naive comment (it's hard to keep all these arguments clear without spending many hours, I thank you who have devoted that time):
WIth that in mind, I still think these basic thoughts could apply:
I agree that if literals are not crisply specified in this standard, the chance of divergence and even competing standards is high. But if you think literal-included triples are not really mappings for SSSOM, then that's the principled decision on which you should stand, and that other thing is not a profile, it's a different standard.
Implementing literal mapping to me is achieving the final step to make SSSOM a duplicata of RDF (and even RDF-star as we can say things about the triple). I don't think we need another RDF. This is exactly what the example just above shows.
You can drop the Simple, yo can change Ontology by Resource and you get SSRM.
In the interest of driving SSSOM 1.0 home in the coming weeks and the enormous amount of things to unpack in all the comments given here, I am ok with yanking the literal profile from the standard, for now (not happy but I can read a room 😂). I can move it to another repo and develop it independently as a non-Standard, and make sure we communicate use cases clearly for this. One day in the far future we can move this "profile" or "standard for something else" back here and have a vote.
Please voice your objections to this approach until 1st August; I will be responsible for the move!
I am ok with yanking the literal profile from the standard, for now […] One day in the far future we can move this "profile" or "standard for something else" back here
This is just another version of “kicking the can down the road”, only in a different repo.
If the intention is that at some point the “literal mapping” becomes a part of SSSOM, we should think about how this will be done right now.
Adding in the future a new class of mappings is a completely different beast than adding or removing a slot in the existing mapping
class.
For now, all the code dealing with SSSOM (in Python, Java, or any other language) can be built around the assumption that there is only one class of mapping. This is not something that will be easy to change, and the longer that assumption stays around, the harder it will be to change.
So if you already know that at some point you want the standard (and its implementations) to deal with several types of mappings (e.g mapping
and literal mapping
), this is something that must be decided ASAP, not “sometimes in the future”.
and have a vote
If you do this (make SSSOM 1.0 with no room for more than one class of mappings, then come back later with a proposition for another class as if it was an afterthought), I can already tell you what my vote will be: No. Absolutely not.
Hi, typing from my phone as I’m away camping without access to a computer. For us (biocurators at EBI) mapping from term to term or string to term are both classes of the same problem. We often have datasets that require both types of mappings to get to the types of identifiers we want. For example I am currently working with a dataset that has a mix of chemical names and CAS numbers. I want to map the CAS number where available (obviously a use case for core SSSOM) and otherwise map the chemical name (literal mapping). This is perhaps not the best example but I can dig out unlimited more when I get back to the office.
At EBI we use two tools for these term and literal mappings respectively: OXO and ZOOMA. So far we have maintained the databases for these tools internally which is not in the FAIR spirit of our community. We are therefore opening up OXO using SSSOM and hope to do the same with ZOOMA.
Just like term mappings, literal mappings are context dependent (we maintain different literal mapping sets per project in zooma for this reason) and have metadata associated eg lexical match or manual curation, a mapping author, a date, etc etc. I don’t think solving these problems twice by making a new SSSLiteralOM complete with website, issue tracker and so on is the best way to spend our time when we already have the community mindshare (or so I thought) and infrastructure here to support it
In fact this is extremely unlikely to happen with the resources we have, so ZOOMA’s data would stay loosely specified and difficult to use - but I thought we left this kind of thing in the past and moved towards trying to agree on things to enable interoperability.
@jamesamcl I am not against representing literal mappings in SSSOM.
I do share a bit of @jonquet ’s concerns about re-inventing RDF, but from what I’ve seen in the wild I am afraid that horse has left the barn anyway: people have already started to use SSSOM/TSV to serialise arbitrary RDF triples and not only triples that represent “mappings”. (This is a concern that has already been mentioned in #324). This is not what SSSOM is intended for, but I don’t think there’s much we can do about it. Once you put a tool in people’s hands, they will use it in any way they like. A kitchen knife is not supposed to be used to turn screws, but people will use one for that purpose if they don’t have a screwdriver. So what? I don’t think we should prevent SSSOM from being useful to manipulate mappings just because people find it useful to do other things with it (including things they shouldn’t do).
But if we are to allow literal mappings to be represented in SSSOM, we should do so correctly, and I am sorry but #235 is not a correct solution in its current state.
I see two ways of representing literal mappings in SSSOM:
A) Having a separate literal mapping
class. That’s what #235 is about, but it does it in way that leaves way too many questions open.
If we want to go that route, I will insist that these questions must be addressed ASAP, before SSSOM 1.0 is published, because as I have stated above, this route breaks the assumption that there is only one class of mapping. That assumption has been there since the beginning of SSSOM, and is still present everywhere in the current form of the standard even after #235 has been merged.
In particular, the MappingSet
class (which is the basis for the SSSOM/TSV format, since a SSSOM/TSV file is basically a serialisation of a MappingSet
object) can only contain Mapping
objects, not LiteralMapping
objects.
So if we now have to deal with more classes of mappings than just Mapping
, how are we going to do that?
MappingSet
as containing only Mapping
objects, and have a separate LiteralMappingSet
class to contain LiteralMapping
object?literal_mappings
slot to MappingSet
?LiteralMapping
a subclass of Mapping
, so we can keep MappingSet
unchanged?mappings
slots of MappingSet
accept indifferently a list of Mapping
or a list of LiteralMapping
?Whatever method we choose is going to have huge implications on SSSOM implementations (especially implementations in statically typed languages), so I am flatly opposed to postponing any decision on this to after 1.0. I don’t care if this means that 1.0 is going to be delayed by 10 months until we figure out how to do it.
B) Shoehorn literal mappings into the existing Mapping
class. That is basically what I proposed in this comment. We don’t create a new class, but we define a way to use the existing Mapping
class to represent literal mappings.
That would be a much less invasive change, with much less implications on implementations, because the assumption that there will only ever be one class of mapping would stand. For that reason: (1) I tend to favor that route; (2) if we want to go that route we can easily postpone that to after 1.0.
Alright, here we go.
I do understand that there are various opposing views on the need for a "literal" profile, but I think this super minimal intervention will satisfy both sides. In essence, we do not have a literal profile; we have a convention that allows us to represent an "entity" by its label (subject_label
) rather than by a semantic identifier. This means we do not need specialised tooling and documentation (nor training).
Huge thanks to @gouttegd 🙏 who managed to steer this massive carrier ship after it had left the harbor. This is rarely successful and needed a huge amount of thought, testing, and patience (mostly with me and my constant questions), and I am supremely happy we managed to make it!
🎉 THAT WAS IT FOLKS - the last issue before SSSOM 1.0 (#189). Thanks to all of you who helped and contributed; now the carrier ship has sailed off the horizon, hopefully, to connect the isolated shores of our data islands!
For those who had started to create pseudo-SSSOM/TSV files using the “literal profile” (even though this has never been officially feasible since that profile had never been connected to the rest of the spec), SSSOM-Java will support reading such files and converting it them to the new proposed convention.
That is, given an input file like this:
#curie_map:
# some: https://example.org/my_source_of_literals
# BTO: http://purl.obolibrary.org/obo/BTO_
# CL: http://purl.obolibrary.org/obo/CL_
#mapping_set_id: https://example.org/myset
literal literal_source predicate_id object_id mapping_justification
uterus some:source skos:exactMatch BTO:0001424 semapv:ManualMappingCuration
sperm some:source skos:exactMatch CL:0000019 semapv:ManualMappingCuration
kidney some:source skos:exactMatch BTO:0000671 semapv:ManualMappingCuration
SSSOM-CLI will silently convert it into:
#curie_map:
# BTO: http://purl.obolibrary.org/obo/BTO_
# CL: http://purl.obolibrary.org/obo/CL_
# some: https://example.org/my_source_of_literals
#mapping_set_id: https://example.org/myset
#subject_type: rdfs literal
#subject_source: some:source
subject_label predicate_id object_id mapping_justification
kidney skos:exactMatch BTO:0000671 semapv:ManualMappingCuration
sperm skos:exactMatch CL:0000019 semapv:ManualMappingCuration
uterus skos:exactMatch BTO:0001424 semapv:ManualMappingCuration
as discussed in #197 we are now going to provide a basic spec for a literal mapping. This is the suggestion: