Closed cboettig closed 1 year ago
@ehwenk updated the PR above to handle UTF-8 chars via unicode encoding, which is a bit ugly maybe but lossless, e.g.
true_triples <- read_nquads("data/ADP.nq")
unescape_unicode <- function(x) {
stringi::stri_unescape_unicode(gsub("<U\\+(....)>", "\\\\u\\1", x))
}
# example query
sparql <-
'SELECT DISTINCT ?orcid ?label
WHERE { ?s <http://purl.org/datacite/v4.4/IsReviewedBy> ?orcid .
?orcid <http://www.w3.org/2000/01/rdf-schema#label> ?label
}
'
rdf_query(true_triples, sparql) %>%
mutate(label = unescape_unicode(label)) # replace unicode with proper accented characters
Thanks @cboettig - great inputs!
@ehwenk I'll leave you to merge PRs when ready!
Nice work @ehwenk , this looks great. I tweaked the R code lightly for some minor RDF issues.
<xsd:string>
)<NA>
URIsrdflib
uses some old redland C bindings that aren't :(\
use.Instead of writing to csv, we then write the three columns in the n-quads serialization. (ok, technically four columns, "quads" adds a "graph" column, which is simply ".", meaning all these triples are part of the same "graph").
This looks nearly good. One remaining thing is that n-quads, being a trivially simple format, doesn't support prefixes, so it looks like the
<xsd:string>
URIs will have to be expanded to use absolute URLs instead. Not sure if there are any other prefixes.I added a few 'smoke test' SPARQL queries at the end. SPARQL is kinda like SQL, but supports this cool trick where you can create that let you walk the graph. You probably won't use it but it can be kinda cool, see examples at end of R file.