Closed joeflack4 closed 6 months ago
@gouttegd has already documented this here:
https://mapping-commons.github.io/sssom/spec/#tsv
The YAML metadata block MUST contain a curie map that allows the unambiguous interpretation of CURIES. A curie map is supplied after a curie_map: parameter in the yaml file. The value is a dictionary of CURIE->URLPREFIX pairs. Note that the following prefixes are built-in and (1) MUST NOT be changed from their SSSOM default interpretation and (2) MAY be omitted from the curie map: "sssom", "owl", "rdf", "rdfs", "skos", "semapv".
So, you are right, they are optional and sssom-py is a bit obsessive adding these optionals.
I plan to clarify that even further in my upcoming¹ overhaul of the spec by defining a canonical TSV serialisation.
The canonical serialisation is the serialisation that SSSOM/TSV writers will be recommended to adopt in order to minimise serialisation differences across implementations, to avoid possibly huge and meaningless diffs when a SSSOM/TSV file is modified by different tools. It’s a generalisation of the logic by which we are already recommending that TSV columns should be written in a spec-defined order.
Regarding the prefix map, the “canonical” guideline will be that SSSOM/TSV writers should write the minimal effective prefix map – that is, the prefix map should only contain prefix names that (1) are not already built-in, and (2) are effectively used somewhere in the mapping set.
But that’s only a recommendation for writers. For SSSOM/TSV readers, all that matters is that (1) there are no prefix names in the set that are not declared in the prefix map, with the exception of the built-in prefix names, and (2) if the built-in prefix names are declared, they must point to the same prefixes as in the spec.
So it is never wrong for a set to have a prefix map that contains superfluous prefix names (names that are built-in and/or that are never used in the set), and readers must never reject a set because of that. It is simply recommended that writers avoid generating such maps.
-- ¹ Yes, it is coming up. At some point. Eventually.
Thanks for the helpful responses! This is already done, then! Closing.
Overview
Just a request to have documentation on this question: Are built-in prefixes in metadata required, or recommended?
Additional info
Thoughts I prefer to be minimalist to the point that I want prefixes in my metadata if and only if they appear in mapping set / any of the rows in my TSV. But I do see the UX benefits of including common built-ins.
Related