ucum-org / ucum

https://ucum.org
Other
55 stars 10 forks source link

Grammar specification is self-contradictory #108

Closed timbrisc closed 10 years ago

timbrisc commented 10 years ago

Issue migrated from trac ticket # 158

priority: critical | resolution: fixed | keywords: grammar specification

2014-05-05 17:03:09: fbeyer@tomtec.de created the issue


The specification at http://unitsofmeasure.org/ucum.html describes the grammar of UCUM expressions and should AFAIK be the reference for implementations.

Unfortunately, this specification is self-contradictory in some cases:

  • §10 (3) reads: "Since a unit term in parenthesis can be used in place of a simple unit, an exponent may follow on a closing parenthesis which raises the whole term within the parentheses to the power." This is not consistent with the Backus-Naur grammar, were parentheses form "components", not "simple units". Since there seems to be no use case for expressions such as "(m/s)2", I suggest to remove §10 (3) from the specification.

  • The description of the Backus-Naur grammar is out-of-date. It refers to the non-terminal "power" that does not exist, probably because of the inconsistency mentioned above.

  • The Backus-Naur grammar leads to right-to-left evaluation order. for example, the expression "a/b/c" will be parsed as: [EDIT: I (GS) edited this parse tree to represent what I understand is the salient point.]

           term
       /   |       \
    component  |        \
    |      |         \
    |      |           term
    |      |        /    |    \
    |      |  component  |    term
    |      |      |      |      |
    |      |      |      |  component
    |      |      |      |      |
    "a"    "/"    "b"    "/"    "c"

    ...and therefore interpreted as "a/(b/c)" instead of "(a/b)/c".

It is particularly unsatisfying that I found one case where the text wins over the formal grammar, and one where it is the other way around. With the specification in this form it is impossible to decide on the "UCUM conformance" of an implementation.

timbrisc commented 10 years ago

2014-06-18 00:51:26: gschadow@pragmaticdata.com

timbrisc commented 10 years ago

2014-06-18 00:51:26: gschadow@pragmaticdata.com commented


Related are #4 and #54 it's time to resolve this.

timbrisc commented 10 years ago

2014-06-18 02:45:39: gschadow@pragmaticdata.com commented


Here is the salient complaint from #54

  1. The BNF syntax for terms is somewhat confusing because of the following two paragraphs in the standard:

    §8 integer numbers A positive integer number may appear in place of a simple unit symbol.

    §10 nested terms Unit terms with operators may be enclosed in parentheses (‘(’ and ‘)’) and used in place of simple units.

    I would expect that these two rules would have been incorporated in the BNF syntax, i.e. that and would have been defined like this:

<simple-unit> ::= <ATOM-SYMBOL>
                  | <PREFIX-SYMBOL><ATOM-SYMBOL> | <factor> | “(”<term>“)”

<component> ::= <annotatable><annotation>
                 | <annotatable> | <annotation>

This syntax includes "(3)2" as a proper unit term, although the rule from §10 of the standard strictly implies that the expression between parentheses should contain operators. The Regenstrief conversion tool accepts "(3)2" which evaluates to 9. So, should the first line of §10 not read as follows?

§10 nested terms  Unit terms may be enclosed in parentheses (‘(’ and ‘)’) and used in place of simple units. 
timbrisc commented 10 years ago

2014-06-18 02:50:42: gschadow@pragmaticdata.com commented


4 has already been mostly resolved.

§7 algebraic unit terms ![...]  ■3 The division operator can be used as a binary and unary operator, i.e. a leading solidus will invert the unit that directly follows it.

so /a.b/c is clear the same as 1/a.b/c or a-1.b.c-1.

The BNF seems not to reflect that:

<component> ::=   <annotatable><annotation>
            | <annotatable>
            | <annotation>
            | <factor>
            | “(”<term>“)”

<term>    ::= “/”<component>
        | <component>“.”<term>
        | <component>“/”<term>
        | <component>

And that was then resolved by adding main-term as a start symbol:

<sign>    ::= “+” | “-”
<digit>   ::= “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9”
<digits>  ::= <digit><digits> | <digit>
<factor>  ::= <digits>
<exponent>    ::= <sign><digits> | <digits>
<simple-unit> ::= <ATOM-SYMBOL>
                    | <PREFIX-SYMBOL><ATOM-SYMBOL>
<annotatable> ::= <simple-unit><exponent>
                    | <simple-unit>
<component>   ::= <annotatable><annotation>
                      | <annotatable>
                      | <annotation>
                      | <factor>
                      | “(”<term>“)”
<term>    ::= <component>“.”<term>
              | <component>“/”<term>
              | <component>
<main-term>   ::= “/”<term>
                      | <term>
<annotation>  ::= “{”<ANNOTATION-STRING>“}”
timbrisc commented 10 years ago

2014-06-18 13:41:12: gschadow@pragmaticdata.com changed status from new to assigned

timbrisc commented 10 years ago

2014-06-18 13:41:12: gschadow@pragmaticdata.com changed owner from * to gschadow*

timbrisc commented 10 years ago

2014-06-18 13:41:12: gschadow@pragmaticdata.com commented


Let's not worry about the right-to-left association for a moment.

The observation that we are talking about "power" in the caption but not actually have it in the BNF is good. May be we should have it?

Exponent is definitely misplaced if we wanted to have that §10 (3)

<simple-unit> ::= <ATOM-SYMBOL>
                    | <PREFIX-SYMBOL><ATOM-SYMBOL>
<annotatable> ::= <simple-unit><exponent>
                    | <simple-unit>
<component>   ::= <annotatable><annotation>
                     | <annotatable>
                     | <annotation>
                     | <factor>
                     | “(”<term>“)”
<term>    ::= <component>“.”<term>
              | <component>“/”<term>
              | <component>
<main-term>   ::= “/”<term>
                      | <term>
<annotation>  ::= “{”<ANNOTATION-STRING>“}”

So the question is fair, should we just drop this rule? It is not being used anywhere currently, doubt that anyone understands it.

We need to make that an advisory and perhaps release this tentatively for public comments.

About the left to right association, this can be resolved easily:

<term>    ::= <term>“.”<component>
              | <term>“/”<component>
              | <component>
timbrisc commented 10 years ago

2014-06-18 13:43:54: gschadow@pragmaticdata.com edited the issue description

timbrisc commented 10 years ago

2014-06-18 14:33:09: gschadow@pragmaticdata.com commented


Strike:

<verse> Since a unit term in parenthesis can be used in place of
a simple unit, an exponent may follow on a closing parenthesis which
raises the whole term within the parentheses to the power.
</verse>

And added comment on the removed text.

           <p>
Up until revision 1.9 there was a third clause 
&ldquo;Since a unit term in parenthesis can be used in place of
a simple unit, an exponent may follow on a closing parenthesis which
raises the whole term within the parentheses to the power.&rdquo;
However this feature was inconsistent with any BNF or other syntax
description ever provided, was never used and seems to have no 
relevant use case. For this reason this clause has been stricken.
This is a <emph>tentative</emph> change. Users who have used this 
feature in the past, should please comment on this deprecation. 
If we receive indication that this feature was used by anyone, we
would undo the deprecation. If no comments are received, the 
deprecation continues to take effect.
           </p>

Strike the entire caption detail under the BNF as it is much out of date and only introduces more confusion.

The term vs. component associativity changes as indicated above.

timbrisc commented 10 years ago

2014-06-18 21:55:11: gschadow@pragmaticdata.com changed status from assigned to closed

timbrisc commented 10 years ago

2014-06-18 21:55:11: gschadow@pragmaticdata.com changed resolution from * to fixed*

timbrisc commented 10 years ago

2014-06-18 21:55:11: gschadow@pragmaticdata.com commented


Done: [16412]