CSV and TSV results - Githubissues

gkellogg commented 3 years ago

As an user, I want a to be able to retrieve SPARQL results as CSV or TSV So that I can use different toolchains to analyze the results.

There are proposals (#43) for extending SPARQL results in JSON and XML for RDF*. However, SPARQL also defines results in CSV and TSV formats.

As both formats have the ability to contain quoted values that include delimiters when the field is quoted (including quotes), the format can be used to express embedded triple results as well.

Consider the "bob-bind" query:

PREFIX : <http://bigdata.com>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?a ?b ?c WHERE {
   ?bob foaf:name "Bob" .
   BIND( <<?bob foaf:age ?age>> AS ?a ) .
   ?a ?b ?c .
}

when run against:

@prefix : <http://bigdata.com/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex:  <http://example.org/> .

:bob foaf:name "Bob" .
<<:bob foaf:age 23>> <http://example.org/certainty> 0.9 .

In JSON results, this would produce the following:

{
  "head": {"vars": ["a","b","c"]},
  "results": {
    "bindings": [
      {
        "a": {
          "type": "triple",
          "value": {
            "subject": {
              "type": "uri",
              "value": "http://bigdata.com/bob"
            },
            "predicate": {
              "type": "uri",
              "value": "http://xmlns.com/foaf/0.1/age"
            },
            "object": {
              "type": "typed-literal",
              "datatype": "http://www.w3.org/2001/XMLSchema#integer",
              "value": "23"
            }
          }
        },
        "b": {
          "type": "uri",
          "value": "http://example.org/certainty"
        },
        "c": {
          "type": "typed-literal",
          "datatype": "http://www.w3.org/2001/XMLSchema#decimal",
          "value": "0.9"
        }
      }
    ]
  }
}

In CSV, this might produce the following:

a,b,c
"http://bigdata.com/bob,http://xmlns.com/foaf/0.1/age,23",http://example.org/certainty,0.9

This requires that a client detect that the cell content is, itself, in CSV form, and interpret it as subject,predicate,object.

The TSV form could provide a datatype to the string-encoded embedded TSV for spo, to make it more explicit.

?a\t?b\t?c
"<http://bigdata.com/bob>\t<http://xmlns.com/foaf/0.1/age>\t23" <http://example.org/certainty>  0.9

pchampin commented 3 years ago

Looking at https://www.w3.org/TR/sparql11-results-csv-tsv/, I think there is a more natural way to extend those formats:

The CSV result format uses the SPARQL STR() function. We should anyway define what STR() is supposed to return for embedded triples in SPARQL*. Once this is done, the CSV result format is also naturally extended. Note that we can chose to STRify embedded triples using the comma as a separator, as you suggest.
The TSV result format uses the SPARQL syntax to represent terms; for SPARQL*, it would then seem natural to represent embedded triples using the << ... >> notation in TSV. Furthermore, this makes it easier for parsers to detect that a given value is an embedded triples (only 2 characters to took ahead).

gkellogg commented 3 years ago

Looking at https://www.w3.org/TR/sparql11-results-csv-tsv/, I think there is a more natural way to extend those formats:

The CSV result format uses the SPARQL STR() function. We should anyway define what STR() is supposed to return for embedded triples in SPARQL*. Once this is done, the CSV result format is also naturally extended. Note that we can chose to STRify embedded triples using the comma as a separator, as you suggest.

Note, however, that the STR function is not likely to take into consideration format restrictions, such as the , separator in CSV. It will require some transformation to the format-specific result, which is then subject to the escaping requirements of the CSV format. Indeed, looking at 17.4.2.5 str, it seems to only be defined for literals and IRIs, formally. But, certainly the components of the field could be determined via STR().

The TSV result format uses the SPARQL syntax to represent terms; for SPARQL*, it would then seem natural to represent embedded triples using the << ... >> notation in TSV. Furthermore, this makes it easier for parsers to detect that a given value is an embedded triples (only 2 characters to took ahead).

I would be happy with using the << ... >> notation for embedded triples in TSV.

pchampin commented 3 years ago

it seems to only be defined for literals and IRIs

Darn, you are right. So defining it for embedded triples does not seem to be a requirement (especially as these may contain bnodes). I read too quickly and didn't see there was a special treatment for bnodes, so one for embedded triples may be appropriate too.

As for escaping, it is taken care of by the third paragraph of §3.2 in https://www.w3.org/TR/sparql11-results-csv-tsv/.

w3c / rdf-star

CSV and TSV results #48