owlcollab / oboformat

Automatically exported from code.google.com/p/oboformat
5 stars 2 forks source link

1.6: make tag ordering part of normative specification, not informative #138

Open cmungall opened 1 year ago

cmungall commented 1 year ago

currently the 1.4 spec points to the 1.4 guide for advice on tag ordering. (aside: the link to the guide is a broken google code link, should be)

The tag ordering should be normative (of course, it is still valid to emit in any order, but writers SHOULD follow normative tag ordering)

Also we should base the normative order on what the owlapi does. This doesn't strictly follow the guide. E.g. the guide has

however, the owlapi inverts this. Rather than cause churn and insist on following the guide, we should obsolete the ordering in the guide, and create a new normative standard based on what the owlapi does.

clause ordering needs to be clarified. The owlapi seems to be implementing an odd variant of the original OE rules

cmungall commented 1 year ago

This is what the owlapi considers canonical ordering.

format-version: 1.2
synonymtypedef: x "test synonym type"
synonymtypedef: Y "test synonym type"
synonymtypedef: z "test synonym type"

...

id: X:1
name: synonym order test
synonym: "A" BROAD []
synonym: "A" EXACT []
synonym: "A" NARROW []
synonym: "A" RELATED []
synonym: "a" BROAD Y []
synonym: "a" BROAD x []
synonym: "a" BROAD z []
synonym: "a" BROAD []
synonym: "a" EXACT Y []
synonym: "a" EXACT []
synonym: "a" EXACT x []
synonym: "a" EXACT [PMID:1]
synonym: "a" EXACT [pmid:1]
synonym: "a" EXACT [PMID:2]
synonym: "a" EXACT [pmid:2]
synonym: "a" EXACT [pmid:1, pmid:2]
synonym: "a" EXACT [PMID:1, PMID:2]
synonym: "a" NARROW Y []
synonym: "a" NARROW []
synonym: "a" NARROW x []
synonym: "a" RELATED []
synonym: "a" RELATED Y []
synonym: "a" RELATED x []
synonym: "A " BROAD []
synonym: "A " EXACT []
synonym: "A " NARROW []
synonym: "A " RELATED []
synonym: "Ab" BROAD []
synonym: "Ab" EXACT []
synonym: "Ab" NARROW []
synonym: "Ab" RELATED []
synonym: "ab" BROAD []
synonym: "ab" EXACT []
synonym: "ab" NARROW []
synonym: "ab" RELATED []
synonym: "Ac" BROAD []
synonym: "Ac" EXACT []
synonym: "Ac" NARROW []
synonym: "Ac" RELATED []
synonym: "ac" BROAD []
synonym: "ac" EXACT []
synonym: "ac" NARROW []
synonym: "ac" RELATED []
synonym: "As" RELATED []
synonym: "as" NARROW []
synonym: "astacin activity" EXACT []
synonym: "Astacus" RELATED []
synonym: "astacus" NARROW []
synonym: "Astacus proteinase activity" RELATED []
synonym: "astacus proteinase activity" NARROW []

I can reverse engineer the rules, except two things are baffling me

for BROAD, a null type is listed last

synonym: "a" BROAD Y []
synonym: "a" BROAD x []
synonym: "a" BROAD z []
synonym: "a" BROAD []

yet for other scopes a null is intermediate

synonym: "a" EXACT Y []
synonym: "a" EXACT []
synonym: "a" EXACT x []

also why this?

synonym: "a" EXACT [PMID:1]
synonym: "a" EXACT [pmid:1]
synonym: "a" EXACT [PMID:2]
synonym: "a" EXACT [pmid:2]
synonym: "a" EXACT [pmid:1, pmid:2]
synonym: "a" EXACT [PMID:1, PMID:2]
gouttegd commented 1 year ago

Regarding the ordering of tags in a stanza, it seems to be done according to priority values that are listed here: https://github.com/owlcs/owlapi/blob/0044b995936d6b51ad536d705b6c7f50a6001d1f/oboformat/src/main/java/org/obolibrary/oboformat/parser/OBOFormatConstants.java#L71, e.g.:

/**TAG_ID.   */ TAG_ID    ("id",   10000,  5,      5),
/**TAG_NAME. */ TAG_NAME  ("name", 10000,  15,     15),

First value is the priority when inside the header frame (here set to 10,000 because those tags do not belong to a header frame), the second value is the priority when inside a term frame, and the last one is the priority when inside a typedef frame.

For the ordering of clauses with the same tag, it’s done by the ClauseComparator defined here: https://github.com/owlcs/owlapi/blob/0044b995936d6b51ad536d705b6c7f50a6001d1f/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java#L800.

My understanding after a cursory look is that this comparator only compares the first two values of a clause, so that when clauses only differ by their third value (if present) or by their cross-references, the resulting order is not actually specified.

cmungall commented 1 year ago

Thanks @gouttegd! The declarative piece of code for the tag ordering is useful, we can drive the normative standard from that

ordering of clauses within a tag is a bit troubling (and that code looks familiar, it is likely not altered much since my initial version). It seems that ordering beyond the first two values is determined by some kind of java internals, and could in theory change at any time