rdfjs / N3.js

Lightning fast, spec-compatible, streaming RDF for JavaScript
http://rdf.js.org/N3.js/
Other
714 stars 130 forks source link

Escaped quotes in literals of n-triples file are parsed, but not escaped anymore in js #266

Open flyon opened 2 years ago

flyon commented 2 years ago

I'm trying to parse DBPedia ontology. I've downloaded their n-triples file. line 2650 looks like this

<http://dbpedia.org/ontology/price> <http://www.w3.org/2000/01/rdf-schema#comment> "The price of something, eg a journal. For \"total money earned by an Athlete\" use gross"@en .

So the literal pattern is [quote][text with escaped quotes][quote][language-tag]

n3 parses this correctly, but for the object it gives me a LiteralNode with the following id:

"The price of something, eg a journal. For "total money earned by an Athlete" use gross"@en

So now the quotes are no longer escaped.

node.id.charCodeAt(0) === ‌‌node.id.charCodeAt(43)

returns true. Meaning the quotes are identical.

This causes issue with the regex in my code that takes in properly formed literals (with start and end quotes and optional language tags etc)

So I'm going to have to fix the node.id first before using it (since I think the regex is correct).

Is this a bug?

RubenVerborgh commented 2 years ago

This is working as intended.

node.id is an internal storage variable that should not be accessed by any external components, and it indeed uses (by design) the format you mention above.

You are looking for either:


regex in my code that takes in properly formed literals

I can't tell from this description what your code is doing, but I expect you just need node.value. There's no need for you to parse it again.

If you indeed want to work with "properly formed literals" (assuming here: unparsed N-Triples), it seems that you want to bypass the N3 parser and just read the raw N-Triples document yourself.