kba / hocr-spec

The hOCR Embedded OCR Workflow and Output Format
http://kba.github.io/hocr-spec/1.2/
72 stars 20 forks source link

Incorrect BNF for properties-format #98

Closed wollmers closed 3 years ago

wollmers commented 7 years ago

Now:

properties-format = key-value-pair *(whitespace semicolon key-value-pair)

What most programs deliver:

properties-format = key-value-pair *(*whitespace semicolon whitespace key-value-pair)

What readers should parse:

properties-format = key-value-pair *(*whitespace semicolon *whitespace key-value-pair)

The first one is definitely wrong.

kba commented 7 years ago

The problem I see with not requiring whitespace before/or after the the separating semicolon is that semicolon cannot be used in values then, e.g. parsing image imageserver?scale=1;width=100; bbox 0 0 100 100.

The grammar at the moment forbids using semicolon for anything but delimiting kv-pairs and I haven't come across semicolon in URL in a while and in fact most implementations I've seen use \s*;\s* as the separator. We can go with your second proposal, can you open a PR?

Further feedback or improvements, in particular for the grammar sections, is appreciated, it's more of a draft since I've never written an ABNF grammar before.

wollmers commented 7 years ago

@kba Oh, yes, but then the BNF for property-value is also wrong.

Now:

property-value = ascii-word *(whitespace ascii-word)

Where:

ascii-word       = +(%x21-7E - semicolon)  ; printable w/o space/semicolon
ascii-string     = +(%x01-FF - semicolon)  ; printable ascii without semicolon
delimited-string = doublequote ascii-string doublequote

Should be:

ascii-word       = +(%x21-7E - semicolon)  ; printable w/o space/semicolon
ascii-string     = +(%x21-7E - doublequote)  ; printable ascii without doublequote
delimited-string = doublequote ascii-string doublequote

property-value = (ascii-word / delimited-string) *(whitespace (ascii-word / delimited-string) )

And finally:

properties-format = key-value-pair *(*whitespace semicolon *whitespace key-value-pair)

This would allow:

image "imageserver?scale=1;width=100"; bbox 0 0 100 100
image "imageserver?scale=1;width=100" ; bbox 0 0 100 100
image "imageserver?scale=1;width=100" ;bbox 0 0 100 100

But this is invalid:

image imageserver?scale=1;width=100;

See also the grammar of the image property:

property-name = "image"
property-value = delimited-string
stweil commented 3 years ago

The pull request was merged. Is this issue solved? Can it be closed?

kba commented 3 years ago

Yes, this is fixed, thanks @wollmers