linkeddata / rdflib.js

Linked Data API for JavaScript
http://linkeddata.github.io/rdflib.js/doc/
Other
565 stars 143 forks source link

Does not properly process 8-bit unicode escape sequences on string literals #494

Open jaxoncreed opened 3 years ago

jaxoncreed commented 3 years ago

This is an extension of this issue on N3.js which does process 8-bit unicode escape sequences properly: https://github.com/rdfjs/N3.js/issues/248. There is also a test suite to test this: https://github.com/w3c/rdf-tests/pull/65.

Problem 1: Strange backslashes when writing to a Pod.

Let's say I want to PATCH the following triple to my POD <https://pod.example/chat1.ttl#message1> <http://rdfs.org/sioc/ns#content> "Here's my emoji: 😊".

Parsing this using n3 turns this into <https://pod.example/chat1.ttl#message1> <http://rdfs.org/sioc/ns#content> "Here's my emoji: \\U0001f60a". Or at least that's what you would want to write to a Pod.

Running the following code

function patchToPod(uri: string, dataset: DatasetCore) {
  const writer = new Writer({ format: "N-Triples" });
  for (const quad of dataset) {
    writer.addQuad(quad);
  }
  writer.end(async (error, parsedString: string) => {
    // Parsed String is `<https://pod.example/chat1.ttl#message1> <http://rdfs.org/sioc/ns#content> "Here's my emoji: \U0001f60a".`
    fetch(uri, {
      method: "PATCH",
      body: `INSERT { ${parsedString} }`,
      headers: { 'content-type': 'application/sparql-update' }
    })
  });
}

causes the following to be written to the Pod on NSS:

@prefix : <#>.
@prefix ch: <https://pod.example/chat1.ttl#>.
@prefix n: <http://rdfs.org/sioc/ns#>.

ch:message1 n:content "Here's my emoji: \uf60a".

Notice that instead of \U0001F60A it's \uf60a. I'm not 100% sure why this is because I'm not super versed in escape codes, but @RubenVerborgh believes this is a problem with rdflib.js on NSS.

Adding an additional slash to the parsed string in the code above fixes it:

fetch(uri, {
  method: "PATCH",
  body: `INSERT { ${parsedString.replace(`\\U`, `\\\\U`)} }`,
  headers: { 'content-type': 'application/sparql-update' }
})

However, @RubenVerborgh confirmed that adding these backslashes shouldn't be required.

Problem 2: Solid Clients don't understand 8-bit unicode escape sequences

When a unicode escape sequence is correctly formatted in the Pod, clients using rdflib.js are unable to understand it. For example, here's what a message in the SolidOS chat looks like with unicode escape:

:fd0071f5-01f9-416d-aa01-24eb6666d618
    n:content "Emoji3 \\U0001f60a";

Causes: image

:fd0071f5-01f9-416d-aa01-24eb6666d618
    n:content "Emoji3 \U0001f60a";

Causes: image

But,

:fd0071f5-01f9-416d-aa01-24eb6666d618
    n:content "Emoji3 \ud83d\ude0a";

Causes: image

So, even if I were to build functionality into my app to parse unicode, it wouldn't be interoperable with the apps that do not.