w3c / rdf-turtle

https://w3c.github.io/rdf-turtle/
Other
4 stars 4 forks source link

Update Turtle grammar for quoted triples and annotations #27

Closed gkellogg closed 1 year ago

gkellogg commented 1 year ago

Adds the CG Report changes for Turtle. Annotation syntax is tentative depending on potential changes to semantics, but matches the SPARQL annotation form.

Fixes #26.


Preview | Diff

gkellogg commented 1 year ago

Just the EBNF updates, right now. With PR Preview issues, probably easiest to see the grammar in the "Files changed" view.

gkellogg commented 1 year ago

Not sure why PR Preview isn't showing a formatted version. You can see a GitHack version here.

gkellogg commented 1 year ago

The GitHack link was pegged to a specific commit; I updated it to be on the branch, but caching may still not allow the grammar to show the changes. All literals are no marked with class grammar-literal. If it doesn't render properly, you may need to run locally off of localhost to see it rendered properly, although it should eventually be right.

Still to resolve is @terminals. This is found in the wild, and I document in my version of the EBNF grammar, which does contain some extensions. The XML 1.0 EBNF grammar, among other things, does not describe the use of rule numbers. My version documents @terminals and @pass (although, @pass is not used here) to separate the productions from the terminals, but as we use the "CAPITAL CASE", it would be unambiguous without.

afs commented 1 year ago

I've not seen @terminal though the more local issue is that we have @prefix and @base in the grammar so ther eis a possble confusion there.

We split grammar to state which are terminals in SPARQL because:

  1. Different matching rules - white space is ignored as part of tokenizing, and also tokens greedy match.
  2. In practice, tools differentiate - whether in their sinmgle input file or in the case of yacc/lex family the two parts are separate programs.
gkellogg commented 1 year ago

I had to manually hack the HTML output, but it should be consistent with the 1.1 presentation, now. Just have to remember to do that in the future, and go back to other outputs.

Tried to do it in CSS, but browser behavior is inconsistent, given the treatment of table column layout.

afs commented 1 year ago

Productions for Terminals looks much better.

The githack complete doc helps a lot.

Is it some kind of artifact of how the githack copy is made that hex constants get dotted underline?

Screenshot from 2023-06-06 21-11-06

PN_CHARS_BASE use of orange | is a bit confusing and same in PN_LOCAL_ESC, STRING_LITERAL_LONG* putting | close to parser-literals.

? and * are parser literal color - inconsistent but less visual juxtaposition.

STRING_LITERAL_QUOTE, STRING_LITERAL_SINGLE_QUOTE used to have a comment.

gkellogg commented 1 year ago

Is it some kind of artifact of how the githack copy is made that hex constants get dotted underline?

Screenshot from 2023-06-06 21-11-06

It doesn't show up in Safari. In Chrome, it also shows up when run locally, but the inspector makes it look like it's from the user agent (user agent stylesheet):

abbr[title] {
    text-decoration: underline dotted;
}

We could override that by setting the style of abbr[title] {text-decoration: none;}, but I'm not sure why that should be necessary.

PN_CHARS_BASE use of orange | is a bit confusing and same in PN_LOCAL_ESC, STRING_LITERAL_LONG* putting | close to parser-literals.

? and * are parser literal color - inconsistent but less visual juxtaposition.

All the operators have their own CSS classes and use the code element; it's picking up the default from code. We can change grammar-opt, grammar-alt, grammar-paren, grammar-diff, grammar-plus, grammar-star, and grammar-brac independently. Should they all be styled the same? How would you like to see them styled?

STRING_LITERAL_QUOTE, STRING_LITERAL_SINGLE_QUOTE used to have a comment.

Those comments (and others) aren't in the source BNF, and would be lost when generating the HTML output. They'd need to be added in manually. Note that the hex sequences are abbr, that should show the underlying character when hovering, so it's a bit redundant. I'll add the comments back in if you like, though, but it will be another manual processing step on the output.