openEHR / adl-antlr

Antrl4 grammars for ADL
Apache License 2.0
1 stars 4 forks source link

Lexer does not recognize URI with empty path #32

Open pieterbos opened 6 years ago

pieterbos commented 6 years ago

The following URI in ODIN is not recognized as an URI by the lexer:

http://www.test.example
http://www.test.example/

They are however both valid URIs.

The lexer does not recognize this because of the following lexer rules:

URI : URI_SCHEME SYM_COLON URI_HIER_PART ( '?' URI_QUERY )? ;
fragment URI_HIER_PART : ( '//' URI_AUTHORITY )? URI_PATH ;
fragment URI_PATH   : ( '/' URI_XPALPHA+ )+ ;

On first glance it looks like this can be fixed with a simple URI_PATH?. However, this clashes with the labels of the expression grammar. So I tried:

fragment URI_HIER_PART : ( '//' URI_AUTHORITY ) | URI_PATH | ( '//' URI_AUTHORITY ) URI_PATH ;

Which is better, but it still clashes with the following rule statement:

label:/path/to/value + /other_path = 3

because it matches label:/path/to/value as an URI.

So the remaining fixes are:

  1. Require the URI_AUTHORITY: fragment URI_HIER_PART : ( '//' URI_AUTHORITY ) URI_PATH? ;

  2. Match the <>-characters that must always surround a URL in the lexer

  3. Find a way to implement different lexer modes for different parts of the archetype

  4. would be best I think. however, there is no easy way in the current ADL language design to implement lexer mode switching without resorting to rather complicated target language constructions. So I stuck with the first solution for now for archie, which is at least better than the alternatives. A better fix would be good though!

pieterbos commented 6 years ago

In addition, to match the trailing slashes in https://www.test.example/ and https://www.test.example/aa/bb/ correctly, we need:

fragment URI_PATH   : '/' | ( '/' URI_XPALPHA+ )+ ('/')?;

probably still not perfect