mapping-commons / sssom

Simple Standard for Sharing Ontology Mappings
https://mapping-commons.github.io/sssom/
BSD 3-Clause "New" or "Revised" License
152 stars 24 forks source link

`other` field: (i) change to JSON? (ii) update docs #149

Open joeflack4 opened 2 years ago

joeflack4 commented 2 years ago

Thoughts

1. Change to JSON?

I feel like embedding JSON would be useful because it is a known standard. I'm not sure if there might be any issues though including a large amount of JSON inside of a single cell.

2. Update docs

Currently says:

Pipe separated list of key value pairs for properties not part of the SSSOM spec. Can be used to encode additional provenance data.

I propose (changes in bold):

Pipe separated list of equals-sign delimited key value pairs for properties not part of the SSSOM spec. Can be used to encode additional provenance data. Example: myKey1=myVal1|myKey2=myVal2

Additional info

Current docs page: https://mapping-commons.github.io/sssom/other/

matentzn commented 2 years ago

I moved this issue here @joeflack4 because this is a question for the spec. Spec changes must be very well motivated, so I feel like embedding JSON would be useful all by itself is not enough. I remember a lot of back and forth on the issue during our workshop (see notes here: https://mapping-commons.github.io/sssom/mc2021/), so I know others like @ShahimEssaid share your thoughts. For those that would like to open the "other" field natively to arbitrary json, please provide a set of arguments. Remember that SSSOM will treat arbitrary json as a string. There wont be any serialisation at all to RDF, other than an AP with a literal, so technically speaking, you can already just write this in the other columns:

json: "{ .. some json..}" 

(i.e. json blob is a string). If you want LinkML validation (I don't even know if that is possible, requires a complex datatype "json"), then yes, you would have to push your suggestion here forward.

(As an aside, dear horrified Semantic Web people: there is a clear reason: we need some way to be able to transport additional metadata not current allowed in the spec.) @udp also made some suggestions on how to do this better with QNames, see ticket above.

At this point, I think we need some clear set of arguments before making any spec changes. Once all the arguments are together, we call a vote.

joeflack4 commented 2 years ago

Sounds good. Thanks for the feedback.

I am on vacation this upcoming Monday, so I think our next 1:1 is on the 21st. I'll be taking half-day vacations during that week as well, but I think it will be no problem for me to meet. Fingers crossed though. It's possible that my mom might have some appointment for me to take her to; she frequently has me help w/ that when I'm visiting. If that happens and I need to reschedule, I'll let you know.

matentzn commented 2 years ago

I am wondering now if the "other" element should be modelled in TSV completely differently:

subject_id predicate_id object_id mapping_justification ext:contactInfo ext:fundingSource
MONDO:123 skos:exactMatch ICD11:123 semapv:ManualMappingCuration Nico Mat, Street 1, 1234 Athens, Greece granteome:123

What this means, basically, is that all columns must be sssom elements unless they are curies. If they are curies, they are considered custom extensions, which can be treated like any other element. This allows us to be maximally flexible in terms of metadata, while still ensuring we can provide a meaningul RDF representation.

A nice upside is that if the community eventually agrees to move ext:contactInfo into sssom (contact_info), we can simply add the mapping ext:contactInfo (and any other variants) to the LinkML model and have the mappings interoperate, at least on the semantic level.

joeflack4 commented 2 years ago

very cool. i think this is a great idea. still will want to have a general way to include other information if no applicable CURIE for the field exists.

matentzn commented 2 years ago

You can always invent a property! joe:comment!