Collection of RDF parsers and serializers implementing the https://github.com/sweetrdf/rdfInterface interface.
Originally developed for the quickRdf library.
format | read/write | class | implementation | streaming[1] |
---|---|---|---|---|
rdf-xml | rw | RdfXmlParser, RdfXmlSerializer | own | yes |
n-triples | rw | NQuadsParser, NQuadsSerializer | own | yes |
n-triples* | rw | NQuadsParser, NQuadsSerializer | own | yes |
n-quads | rw | NQuadsParser, NQuadsSerializer | own | yes |
n-quads* | rw | NQuadsParser, NQuadsSerializer | own | yes |
turtle | rw | TriGParser, TrigSerializer | pietercolpaert/hardf | yes |
trig | rw | TriGParser, TrigSerializer | pietercolpaert/hardf | yes |
JsonLD | rw | JsonLdParser, JsonLdSerializer | ml/json-ld | no |
JsonLD[2] | w | JsonLdStreamSerializer | own[3] | yes |
[1] A streaming parser/serializer doesn't materialize the whole dataset in memory which assures constant (and low) memory footprint.
(this feature applies only to the parser/serializer - see the section on memory usage below)
[2] Use the jsonld-stream
value for the $format
parameter of the \quickRdfIo\Util::serialize()
to use this serializer.
[3] Outputs data only in the extremely flattened Json-LD but works in a streaming mode.
composer require sweetrdf/quick-rdf-io
https://sweetrdf.github.io/quickRdfIo/namespaces/quickrdfio.html
It's very incomplete but better than nothing.\ RdfInterface and ml/json-ld documentation is included.
Remark - there are calls to two other libraries in examples
sweetrdf/quick-rdf and sweetrdf/term-templates.
You may install them with composer require sweetrdf/quick-rdf
and composer require sweetrdf/term-templates
.
Just use \quickRdfIo\Util::parse($input, $dataFactory, $format, $baseUri)
, where:
$input
can be (almost) "anything containing RDF"
(an RDF string, a path to a file, an URL, an opened resource (result of fopen()
), a PSR-7 response or a PSR-7 StreamInterface object).$dataFactory
is an object implementing the \rdfInterface\DataFactory
interface, e.g. new \quickRdf\DataFactory()
.$format
is an optional explicit RDF format indication for handling rare situtations when the format can't be autodetected.
See the src/quickRdfIo/Util.php::getParser()
source code to see a list of all accepted $format
values.$baseUri
is an optional baseURI value (for some kind of $input
values it can be autodected).include 'vendor/autoload.php';
// create a DataFactory - it's needed by all parsers
// (DataFactory implementation comes from other package, here sweetrdf/quick-rdf)
$dataFactory = new \quickRdf\DataFactory();
// parse a file
$iterator = \quickRdfIo\Util::parse('tests/files/quadsPositive.nq', $dataFactory);
foreach ($iterator as $i) echo "$i\n";
// parse a remote file (with format autodetection as github wrongly reports text/html)
$url = 'https://github.com/sweetrdf/quickRdfIo/raw/master/tests/files/spec2.10.rdf';
$iterator = \quickRdfIo\Util::parse($url, $dataFactory);
foreach ($iterator as $i) echo "$i\n";
// parse a PSR-7 response (format recognized from the response content-type header)
$url = 'https://www.w3.org/2000/10/rdf-tests/RDF-Model-Syntax_1.0/ms_7.2_1.rdf';
$client = new \GuzzleHttp\Client();
$request = new \GuzzleHttp\Psr7\Request('GET', $url);
$response = $client->send($request);
$iterator = \quickRdfIo\Util::parse($response, $dataFactory);
foreach ($iterator as $i) echo "$i\n";
// parse a string containing RDF with format autodetection
$rdf = file_get_contents('https://www.w3.org/2000/10/rdf-tests/RDF-Model-Syntax_1.0/ms_7.2_1.rdf');
$iterator = \quickRdfIo\Util::parse($rdf, $dataFactory);
foreach ($iterator as $i) echo "$i\n";
// parse an PHP stream
$stream = fopen('tests/files/quadsPositive.nq', 'r');
$iterator = \quickRdfIo\Util::parse($stream, $dataFactory);
fclose($stream);
// in most cases you will populate a Dataset with parsed triples/quads
// (note that a Dataset implementation comes from other package, e.g. sweetrdf/quick-rdf)
$dataset = new \quickRdf\Dataset();
$url = 'https://github.com/sweetrdf/quickRdfIo/raw/master/tests/files/spec2.10.rdf';
$dataset->add(\quickRdfIo\Util::parse($url, $dataFactory));
echo $dataset;
Just use \quickRdfIo\Util::serialize($data, $format, $output, $nmsp)
, where:
$data
is an object implementing the \rdfInterface\QuadIterator
interface,
e.g. a Dataset or an iterator returned by the parser.$format
specifies an RDF serialization format, e.g. turtle
or ntriples
.
See the src/quickRdfIo/Util.php::getSeriazlier()
source code to see a list of all accepted $format
values.$output
is an optional parameter describing where the output should be written.
If it's missing or null, output is returned as a string. If it's a string,
it's treated as a path to open with fopen($output, 'wb')
. If it's a stream
resource or PSR-8 StreamInterface
instance, the output is just written into it.$nmsp
is an optional parameter used to pass desired RDF namespace aliases
to the serializer. Note some formats like n-triples and n-quads don't support
namespace aliases while in others (e.g. turtle) it's very common to use them.include 'vendor/autoload.php';
$iterator = ...some \rdfInterface\QuadIterator, e.g. one from parsing examples...
// serialize to file in text/turtle format
\quickRdfIo\Util::serialize($iterator, 'turtle', 'myFile.ttl');
// serialize to string
echo \quickRdfIo\Util::serialize($iterator, 'turtle');
// use given namespace aliases when serializing to turtle
$nmsp = new \quickRdf\RdfNamespace();
$nmsp->add('http://purl.org/dc/terms/', 'dc');
$nmsp->add('http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'rdf');
echo \quickRdfIo\Util::serialize($iterator, 'turtle', null, $nmsp);
include 'vendor/autoload.php';
// create a DataFactory - it's needed by all parsers
// (note that DataFactory implementation comes from other package, e.g. sweetrdf/quick-rdf)
$dataFactory = new \quickRdf\DataFactory();
// or any other example from the "Basic parsing" section above
$iterator = \quickRdfIo\Util::parse('tests/files/puzzle4d_100k.nt', $dataFactory);
// or any other example from the "Basic serialization" section above
\quickRdfIo\Util::serialize($iterator, 'rdf', 'output.rdf');
It's worth noting that basic triples/quads filtering can be done in a memory efficient way without usage of a Dataset implementation.
Let's say we want to copy all triples with the https://vocabs.acdh.oeaw.ac.at/schema#hasIdentifier
predicate
from the test/files/puzzle4d_100k.nt
n-triples file into a ids.ttl
turtle file.
A typical approach would be to load data into a Dataset, filter them there and finally serialize the Dataset:
include 'vendor/autoload.php';
$dataFactory = new \quickRdf\DataFactory();
$t = microtime(true);
// parse input into a Dataset
$iterator = \quickRdfIo\Util::parse('tests/files/puzzle4d_100k.nt', $dataFactory);
$dataset = new \quickRdf\Dataset();
$dataset->add($iterator);
// filter out non-matching triples
$template = new \termTemplates\QuadTemplate(null, $dataFactory->namedNode('https://vocabs.acdh.oeaw.ac.at/schema#hasIdentifier'), null);
$dataset->deleteExcept($template);
// serialize
\quickRdfIo\Util::serialize($dataset, 'turtle', 'ids.ttl');
print_r([
'time [s]' => microtime(true) - $t,
'memory [MB]' => (int) (memory_get_peak_usage(true) / 1024 / 1024),
]);
// 4.4s, 125 MB of RAM
but it can be also done by using a "filtering generator" instead of the Dataset. With this approach we avoid materializing the whole dataset in memory which should both reduce memory footprint and speed things up a little:
include 'vendor/autoload.php';
$dataFactory = new \quickRdf\DataFactory();
$t = microtime(true);
// prepare input generator
$iterator = \quickRdfIo\Util::parse('tests/files/puzzle4d_100k.nt', $dataFactory);
// create a generator performing the filtering
$template = new \termTemplates\QuadTemplate(null, $dataFactory->namedNode('https://vocabs.acdh.oeaw.ac.at/schema#hasIdentifier'), null);
$filter = function($iter, $tmpl) {
foreach ($iter as $quad) {
if ($tmpl->equals($quad)) {
yield $quad;
}
}
};
// wrap it into something implementing \rdfInterface\QuadIterator for types compatibility
$wrapper = new \rdfHelpers\GenericQuadIterator($filter($iterator, $template));
// serialize our filtering generator
\quickRdfIo\Util::serialize($wrapper, 'turtle', 'ids.ttl');
print_r([
'time [s]' => microtime(true) - $t,
'memory [MB]' => (int) (memory_get_peak_usage(true) / 1024 / 1024),
]);
// 2.7s, 51 MB of RAM
Results are better but the memory footprint is still surprisingly high.
This is because of the DataFactory implementation we've used and performance optimizations it's applying
(which admitedly in our scenario only slow things down).
We can can optimize further by using as dumb as possible DataFactory implementation
(for that we need another package - sweetrdf/simple-rdf
):
include 'vendor/autoload.php';
$dataFactory = new \simpleRdf\DataFactory();
$t = microtime(true);
// prepare input generator
$iterator = \quickRdfIo\Util::parse('tests/files/puzzle4d_100k.nt', $dataFactory);
// create a generator performing the filtering
$template = new \termTemplates\QuadTemplate(null, $dataFactory->namedNode('https://vocabs.acdh.oeaw.ac.at/schema#hasIdentifier'), null);
$filter = function($iter, $tmpl) {
foreach ($iter as $quad) {
if ($tmpl->equals($quad)) {
yield $quad;
}
}
};
// wrap it into something implementing \rdfInterface\QuadIterator for types compatibility
$wrapper = new \rdfHelpers\GenericQuadIterator($filter($iterator, $template));
// serialize our filtering generator
\quickRdfIo\Util::serialize($wrapper, 'turtle', 'ids.ttl');
print_r([
'time [s]' => microtime(true) - $t,
'memory [MB]' => (int) (memory_get_peak_usage(true) / 1024 / 1024),
]);
// 1.9s, 2 MB of RAM
As we can see the optimized implementation is 2.3 times faster and has 60 times lower memory footprint that a Dataset-based one.
Notes:
foreach
loop body).It's of course possible to instantiate particular parser/serializer explicitly.
This is the only option to fine-tune parser/serializer configuration, e.g.:
$parser = new \quickRdfIo\NQuadsParser($dataFactory, true, \quickRdfIo\NQuadsParser::MODE_TRIPLES);
$serializer = new \quickRdfIo\JsonLdSerializer(
'http://baseUri',
\quickRdfIo\JsonLdSerializer::TRANSFORM_COMPACT,
JSON_UNESCAPED_SLASHES | JSON_PRETTY_PRINT,
'context.jsonld'
);
Be aware that parsing/serialization with the manually created parser/serializer instance requires a little more code.
Compare
include 'vendor/autoload.php';
$data = ...data read from somewhere...
// using \quickRdfIo\Util::serialize()
\quickRdfIo\Util::serialize($data, 'jsonld', 'output.jsonld');
// using manually instantiated serializer
$serializer = new \quickRdfIo\JsonLdSerializer();
$output = fopen('output.jsonld', 'w');
$serializer->serialize($data, $output);
fclose($output);