ruby-rdf / rdf-turtle

Turtle reader/writer for Ruby
http://rubygems.org/gems/rdf-turtle
The Unlicense
31 stars 9 forks source link

Literals delimited with `"""` are serialized in an invalid format when they end with one or more `"` #16

Closed no-reply closed 5 years ago

no-reply commented 5 years ago

The writer outputs """ delimited strings in some cases (e.g. when there is a newline). When the literal serialized in this way ends in ".

Specifically: we produce a literal ending in """". I can't see that the Turtle grammar actually prohibits this, but other parsers (at least Jena and Raptor) have a problem with it. It might be better for us to avoid producing literals in this form either way.

require 'rdf'
require 'rdf/turtle'

g << RDF::Statement.new(RDF::URI('http://example.com/moomin'), RDF.value, "has a newline\n and \"ends in a quote\"")

puts g.dump :ttl
# <http://example.com/moomin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> """has a newline
# and "ends in a quote"""" .
# => nil

There may also be an issue with ''' delimited strings, but I don't know of any cases where we produce those as a default.


EDIT: on a closer look, the grammar does explicitly disallow ending """ delimited strings in ". The string must end in ([^"\] | ECHAR | UCHAR)

no-reply commented 5 years ago

Proposing something along the lines of string = string.gsub('\\', '\\\\\\\\').gsub('"', '\\"') at https://github.com/ruby-rdf/rdf-turtle/blob/develop/lib/rdf/turtle/writer.rb#L450

no-reply commented 5 years ago

A related issue arises with strings with internal " appearing in multiples of 4 or 5 (that are not also multiples of 3).

For example:

graph = RDF::Graph.new
graph << RDF::Statement.new(RDF::URI('http://example.com/moomin'), RDF.value, "has a newline\n and many internal \"\"\"\"\" quotes")

puts graph.dump :ttl

# <http://example.com/moomin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> """has a newline
#  and many internal \""""" quotes""" .