leodido / go-urn

Parser for uniform resource names as seen on RFC 8141, RFC 2141, and RFC 7643
MIT License
87 stars 11 forks source link

PCRE impl. #7

Closed leodido closed 6 years ago

leodido commented 6 years ago

The hypothesis is that processing (validation + group extraction) the URN with PCRE regex the performance will shine (respect to the ANTLR4 version).

Not to mention the maintainability (less code, less is more).

TODOs.

Obs.

Without the ANTLR4 all(*) generated parser is not comfortable to create a CST (concrete syntax tree) the user can navigate with a listener or visitor pattern.

leodido commented 6 years ago

For future reference: is not possibile to capture all the <hex>es (eg., a%1f%2Cbcdse%21) within the specific string part putting them in a group; this because groups do not accumulate.

The last group match overrides the content of the previous.

leodido commented 6 years ago

At the moment the normalization task is done. Anyway unit test for it are missing. Same for the lexical equivalence (which depends on normalization task).

Very close ...