w3c / cogai

for work by the Cognitive AI community group
Other
54 stars 25 forks source link

Chunks syntax: characters allowed for types, names, and ids #7

Open tidoust opened 4 years ago

tidoust commented 4 years ago

The chunk.js implementation suggests that names are composed of letters and digits, as well as a restricted set of punctuation characters.

However, the description of @rdfmap suggests that chunk property values could be IRIs:

@rdfmap {
  dog http://example.com/ns/dog
  cat http://example.com/ns/cat
}

In practice, I wonder what are allowed characters for types, names, and ids. It seems to me that allowing IRIs (as done in JSON-LD) could also help mapping with the semantic world, and that it would allow reasoning about things. For instance, I could have

website https://example.org/ {
  name "An example page"
}

One problem is that commas are allowed in IRIs, which makes them problematic for use in a comma separated list of property values. A solution is to simply use space as a separator between values, or to mandate excaping of commas in IRIs.

draggett commented 4 years ago

The JavaScript implementation currently uses the following regular expressions:

number: /^[-+]?[0-9]+.?[0-9]([eE][-+]?[0-9]+)?$/ name: /^(\|(@)?[\w|\d|\.|_|-|\/|:]+)$/ iso8061: /^\d{4}(-\d\d(-\d\d(T\d\d:\d\d(:\d\d)?(.\d+)?(([+-]\d\d:\d\d)|Z)?)?)?)?$/

Chunk identifiers are names, so your example with a URL for a chunk ID is fine.

Commas are really convenient for list item separators, so to allow IRIs, any commas within them should be escaped.

tidoust commented 4 years ago

I guess we can start with a restricted set of characters and open things up later on.

FWIW, \w is equivalent to [A-Za-z0-9_] and thus already includes \d and _.