w3c / N3

W3C's Notation 3 (N3) Community Group
48 stars 18 forks source link

Be more specific about required regex functionality of string:matches, string:notMatches, and string:scrape #177

Open rybesh opened 1 year ago

rybesh commented 1 year ago

Currently the documentation for string:matches, string:notMatches, and string:scrape currently state that a regex argument is “a regular expression in the perl, python style.” However, this is not specific enough to ensure that one can write portable N3 rules that use these built-ins. There should be a specific list of regex features that implementation built-ins MUST support (even if some implementations may support features beyond this guaranteed set).

(This issue is inspired by the recent changes to eye which broke uses of these built-ins that expected support of Perl Compatible Regular Expressions.)

william-vw commented 1 year ago

@rybesh I think that "Perl-style" regular expressions are the closest thing to a regex standard that is used in practice. But, as you point out, the issue is that N3 implementations will simply delegate to whatever is available in the host language (prolog, C library, java, ...) as this is beyond the scope of its reasoning task. E.g., there are many features missing in java regex, but jen3 simply delegates to that.

What "core" regex features would you enforce (as a kind of lowest common denominator)?

rybesh commented 1 year ago

I would suggest that it be specified to match the SPARQL REGEX function, i.e. to delegate to the XPath fn:matches function specification.

Presumably languages in which N3 tools are being implemented also have SPARQL tools, so there ought to be opportunities for code reuse. And even if not, at least it gives a solid (but maybe not LCD) definition of what to expect a regex function to handle.