w3c / N3

W3C's Notation 3 (N3) Community Group
46 stars 18 forks source link

Can we remove the custom escape sequences definition #180

Closed jeswr closed 1 year ago

jeswr commented 1 year ago

As far as I can tell; the only part of this spec that is more restrictive than turtle is https://w3c.github.io/N3/spec/#escseq.

N3 was originally created to be a strict superset of Turtle; and I think it is critical that this behavior is maintained. Therefore can we not use the same escape sequence logic as Turtle?

william-vw commented 1 year ago

@jeswr I thought they were the same - can you outline the differences between the escape sequences?

jeswr commented 1 year ago

The difference as far as I can tell is that N3 will error on "unecessary" escapes wheras turtle will not. A good example is the following test case

✖ query-survey-10
  new grammar is more restrictive re allowed escapes
  Error: Expected to throw an error when parsing.
  Input: # PxButton | e   | bash .euler http://www.w3.org/2001/sw/DataAccess/tests/data/survey/survey-sample.ttl http://eulersharp.sourceforge.net/2003/03swap/rdfs-rules.n3 --nope --think --query query-survey-10.n3

@prefix str: <http://www.w3.org/2000/10/swap/string#>.
@prefix log: <http://www.w3.org/2000/10/swap/log#>.
@prefix q: <http://www.w3.org/2004/ql#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.

[]
q:select {<> q:answer (?R)};
q:where {?R ?x ?y; log:uri ?S. ?S str:matches "http://example\.org/.*"}.
jeswr commented 1 year ago

The literal in this test case is valid in turtle but invalid in N3.

william-vw commented 1 year ago

@jeswr you are right that this is not allowed in N3 (it has annoyed me on a few occasions) - but seems that both the Turtle and N3 grammar don't allow for it. Otherwise, what changes would you suggest to the grammar to allow for them?

TallTed commented 1 year ago

@jeswr (and @william-vw) — According to http://www.w3.org/2000/10/swap/string#matches:

The subject is a string; the object is is a regular expression in the perl, python style. It is true iff the string matches the regexp.

In other words, "http://example\.org/.*" here is not a literal, it is a regex, and the \. is necessary to make that dot an FQDN separator instead of a wildcard, while the dot in .* is a wildcard.

If something else is to be understood as "The literal in this test case", I suggest you be explicit in identifying and quoting it, rather than indirectly referencing it.

gkellogg commented 1 year ago

The difference as far as I can tell is that N3 will error on "unecessary" escapes wheras turtle will not. A good example is the following test case

I think Turtle is silent on how to handle ECHAR-like escapes outside of the defined range for ECHAR. I think, for example "\x" would just pass through uninterpreted. Is there a Turtle test for this? (maybe there should be, if not).

gkellogg commented 1 year ago

@jeswr (and @william-vw) — According to http://www.w3.org/2000/10/swap/string#matches:

The subject is a string; the object is is a regular expression in the perl, python style. It is true iff the string matches the regexp.

In other words, "http://example\.org/.*" here is not a literal, it is a regex, and the \. is necessary to make that dot an FQDN separator instead of a wildcard, while the dot in .* is a wildcard.

Well, since it's not an IRI or a Blank Node, it must be a literal. In fact according to some definition (e.g., string:matches, it is a log:String, interpreted as a regular expression. I believe the regular expression should be processed according to XSD regular expressions, although for all practical purposes this is just handled by the native languages. It would indicate features that such regard expressions should be restricted to.

The extra escaping required within the regular expression literal is part of its interpretation as a regular expression.

TallTed commented 1 year ago

@gkellogg — You are, of course, correct. Mental shortcuts aren't always conformant with reality.

My thinking was that this regex (literal) is not meant to be interpreted by anything that would consider the \. to be "extra" or "unecessary" escaping — first because it's within quotation marks, and should not be treated as an escape, and second because the thing that is meant to interpret it, in the test as written and the documentation of the predicate for which it is the object, is supposed to treat it as a "regular expression in the perl, python style" (which interpretation I think differs somewhat from XSD regular expressions, but this difference shouldn't matter in the moment).

For me, this suggests that testing for acceptance/support of "unecessary" escaping would require a different test.

jeswr commented 1 year ago

@jeswr you are right that this is not allowed in N3 (it has annoyed me on a few occasions) - but seems that both the Turtle and N3 grammar don't allow for it. Otherwise, what changes would you suggest to the grammar to allow for them?

My original comment was based testing against a turtle parser which I had incorrectly assumed was fully spec compliant. Having tested against some other implementations and looked more closely at the spec I now see that the N3 and Turtle spec are aligned on this matter. My apologies for not spending more time testing this before raising the issue.

It does beg the question of why the N3 spec doesn't just defer to https://www.w3.org/TR/turtle/#sec-escapes to defined escape sequences and instead copy/pastes the definition.

For me, this suggests that testing for acceptance/support of "unecessary" escaping would require a different test.

@TallTed The test I have linked to is a negative test. That is; it is supposed to fail and does contain unnecessary escapes.

Note; if the parser is lenient on unecessary escapes that this will be parsed as the literal "http://example.org/.*" before being passed to a reasoning engine (this is because of how string escape sequences are defined). If we really meant to make that dot an FQDN separator instead of a wildcard which interpreting as regex then the test case would actually need have a double backslash in the string "http://example\\.org/.*". This would then be interpreted as the regex /http://example\.org/.*/.

william-vw commented 1 year ago

@jeswr so can we close this issue?

jeswr commented 1 year ago

Yes - but noting the possible action item "It does beg the question of why the N3 spec doesn't just defer to https://www.w3.org/TR/turtle/#sec-escapes to defined escape sequences and instead copy/pastes the definition."

william-vw commented 1 year ago

@jeswr Good point - it would make it easier for folks to spot the differences (or not, in this case) with Turtle. Feel free to post a PR.