mapping-commons / sssom

Simple Standard for Sharing Ontology Mappings
https://mapping-commons.github.io/sssom/
BSD 3-Clause "New" or "Revised" License
141 stars 24 forks source link

Define a standard sort order for SSSOM files #247

Closed cthoyt closed 1 year ago

cthoyt commented 1 year ago

In order to reduce diff, we should create a standard for sorting SSSOM files. This can be pretty simple, e.g., sort by subject, then predicate, then object, then something else. We should also have a file that "canonicalizes" a SSSOM file that applies this sort order as well as include check of sort order in validation

matentzn commented 1 year ago

Redundant with #39, and also solved: https://mapping-commons.github.io/sssom-py/cli_usage.html#sssom-sort

The rest of the discussion should be I think to document it better. Summary:

If we want to change the current behaviour, lets make a more fine grained ticket with a proposal.

cthoyt commented 1 year ago

@matentzn no that sounds pretty great to me! sorry I missed this (but yes it is the case that the documentation is really sparse_)

graybeal commented 1 year ago

I'm not sure whether you are expecting all files to follow this order, or just declaring what order should be followed if it is to be sorted in a particular case. (If the latter, skip the last two paragraphs of my comment.)

the solution needs to take into account the need to support user-friendly organizations of the mappings. It is often (I'd argue usually) the case that there is a coherent order or structure that is not sorted. And while sorting might make sense within that structure, it doesn't make sense across the whole file.

Which raises the question, why do we need a sort order to reduce diff? It's a file. Nothing will change its order. It isn't produced automatically, or if it is, the thing that produced it can always produce it in the same order it wants to maintain. I'm not sure it will get the same level of reprocessing-with-remodeling that ontologies get, and if it does it's likely to produce an entirely different artifact. (Think of source code, or data files—when they get reprocessed, the processing occurs sequentially and the outputs are likewise sequential.)

cthoyt commented 1 year ago

I’m precisely thinking of the case when files are produced or maintained in an automated or semi-automated way. Biomappings is a perfect example of a project where this is crucial to making it possible to coherently reconcile different people working on different, potentially overlapping and parts at the same time and then facilitate merging after

matentzn commented 1 year ago

the solution needs to take into account the need to support user-friendly organizations of the mappings. It is often (I'd argue usually) the case that there is a coherent order or structure that is not sorted.

@graybeal can you give an example where this is the case?

graybeal commented 1 year ago

Example 1: I'm responsible for branch 1, Joe for branch 2, Mary for branch 3. I want all my mappings to be collocated under the Branch 1 group.

Example 2: My mappings are organized in order of decreasing confidence. As that order changes, I want to be able to track the original order in my diffs, and I want the confidence order maintained.

Example 3: All my mappings to ontology 1 are in the first group, all my mappings to ontology 2 are in the second group.

Example 4: I create lines in-between my mappings to indicate additional info about each mapping.

Example 5: I sort my ontology mappings by property value.

The thing that's interesting is that many of these—but not all—can be handled by sorting on a particular column. (So, might even be options in tools to sort by various columns on output.) And the file, as previously noted, is essentially an XML file. Which makes me think that sorting is going to be a much more common practice than with ontologies, which (A) maintain their meaning by their order (so you can't just go sort a section) and (B) often get reformatted into other syntaxes (which could happen with SSSOM, but I predict won't often).

matentzn commented 1 year ago

@graybeal this makes total sense.

The default ordering proposal will never be prescriptive - just a way to conveniently do a sssom sort command that can be used to produce a default order. Your curation use case makes 100% sense, and in fact this will be the exact same thing for many of our projects as well. Thanks for sharing the examples!