ucum-org / ucum

https://ucum.org
Other
53 stars 10 forks source link

Proposal: Publish a list of parser-challenging valid & invalid UCUM codes #291

Open dalito opened 9 months ago

dalito commented 9 months ago

For implementing UCUM in software the collection of common UCUM unit codes is very helpful to create unit tests.

Also very useful would be a list of invalid UCUM codes which include invalid cases such as

m(/s)
m.(/s)
(m/s)2

A good source for such a list are bug reports in UCUM implementation and old issues which led to clarification of the specification.

incansvl commented 8 months ago

(Note- Response based on the original issue title that referred to a "list of invalid UCUM codes")

The exact aim/intent here needs to be clarified.

The number of possible valid UCUM codes is infinite, so the number of "all possible text strings in the universe" MINUS that number is just another infinity.

A well-implemented UCUM parser will confirm whether a specific string is a valid UCUM expression. What it WON'T tell you is if a particular expression is misleading i.e. what it conveys to a typical human reader is not what a UCUM library will make of the same expression.

It would certainly be possible to produce a "rogues gallery" of UCUM codes that either illustrate specific "foot guns" in the syntax, or are dangerous examples seen in live use. I listed some of those examples myself in the past, although as i'm now retired I don't have easy access to that work any more.

dalito commented 8 months ago

I meant it like you say in your last paragraph. A rather short list (tens) of ucum codes that challenge the parser and helps implementers to reach

A well-implemented UCUM parser will confirm whether a specific string is a valid UCUM expression.

I have some more invalid UCUM codes in the test suite of my trial on a "well-implemented" UCUM parser in Python here.

There are also some challenging valid UCUM codes like "dar" which is only parsed correctly if prefixes (or at least "da") have lower priority than unit atoms. Knowing about such cases would also help when working on a parser/validator. Some examples are in the same test file.

Since I completed the parser, such a list will be of smaller value for myself. But others may find it useful when they start or want to validate some existing code. (related #157)

incansvl commented 8 months ago

Ah, so you WERE really talking about invalid codes to challenge the parser, while I had taken it to more towards "valid but misleading" codes, which are really a different issue.

So, my mistake, but a useful clarification, thank you @dalito .