Object Graph Mapper for managing RDF data stored in MongoDB. See also tripod-node.
DESCRIBE/SELECT
style operations are provided in two flavours
DESCRIBE
) - these are multi-subject graphs retrievable in one self-contained documentSELECT
) - tabular datasetsExtendedGraph
which your application models can wrap or extendrequire_once("tripod.inc.php");
// Queue worker must register these event listeners
Resque_Event::listen('beforePerform', [\Tripod\Mongo\Jobs\JobBase::class, 'beforePerform']);
Resque_Event::listen('onFailure', [\Tripod\Mongo\Jobs\JobBase::class, 'onFailure']);
\Tripod\Config::setConfig($conf); // set the config, usually read in as JSON from a file
$tripod = new Driver(
"CBD_users", // pod (read: MongoDB collection) we're working with
"myapp" // store (read: MongoDB database) we're working with
);
// describe
$graph = $tripod->describe("http://example.com/user/1");
echo $graph->get_first_literal("http://example.com/user/1","http://xmlns.com/foaf/0.1/name");
// select
$data = $tripod->select(
array("rdf:type.u"=>"http://xmlns.com/foaf/0.1/Person"),
array("foaf:name"=>true);
);
if ($data['head']['count']>0) {
foreach($data['results'] as $result) {
echo $result['foaf:name'];
}
}
// an expensive pre-defined graph traversal query
$graph = $tripod->getViewForResource("http://example.com/users","v_users");
$allUsers = $graph->get_subjects_of_type("http://xmlns.com/foaf/0.1/Person");
// save
$newGraph = new \Tripod\ExtendedGraph();
$newGraph->add_literal_value("http://example.com/user/2","http://xmlns.com/foaf/0.1/name","John Smith");
$tripod->saveChanges(
new \Tripod\ExtendedGraph(), // the before state, here there was no before (new data)
$newGraph // the desired after state
);
// save, but background all the expensive view/table/search generation
$tripod = new \Tripod\Mongo\Driver("CBD_users", "usersdb", array(
'async' = array(OP_VIEWS=>true,OP_TABLES=>true,OP_SEARCH=>true) // async opt says what to do later via a queue rather than as part of the save
)
);
$tripod->saveChanges(
new \Tripod\ExtendedGraph(), // the before state, here there was no before (new data)
$newGraph // the desired after state
);
PHP >= 5.5
Mongo 3.2.x and up.
MongoDB PHP driver version. http://mongodb.github.io/mongo-php-driver/#installation
Before you can do anything with tripod you need to initialise the config via the Config::setConfig()
method. This takes an associative array which can generally be decoded from a JSON string. Here's an example:
{
"namespaces" : {
"rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"foaf":"http://xmlns.com/foaf/0.1/",
"exampleapp":"http://example.com/properties/"
},
"defaultContext":"http://talisaspire.com/",
"data_sources" : {
"cluster1": {
"type": "mongo",
"connection": "mongodb:\/\/localhost",
"replicaSet": ""
},
"cluster2": {
"type": "mongo",
"connection": "mongodb:\/\/othermongo.example.com",
"replicaSet": ""
}
},
"stores" : {
"myapp" : {
"data_source" : "cluster1",
"pods" : {
"CBD_users" : {
"cardinality" : {
"foaf:name" : 1
},
"indexes" : {
"names": {
"foaf:name.l":1
}
}
}
},
"view_specifications" : [
{
"_id": "v_users",
"from":"CBD_users",
"type": "exampleapp:AllUsers",
"include": ["rdf:type"],
"joins": {
"exampleapp:hasUser": {
"include": ["foaf:name","rdf:type"]
"joins": {
"foaf:knows" : {
"include": ["foaf:name","rdf:type"]
}
}
}
}
}
],
"table_specifications" : [
{
"_id": "t_users",
"type":"foaf:Person",
"from":"CBD_user",
"to_data_source" : "cluster2",
"ensureIndexes":[
{
"value.name": 1
}
],
"fields": [
{
"fieldName": "type",
"predicates": ["rdf:type"]
},
{
"fieldName": "name",
"predicates": ["foaf:name"]
},
{
"fieldName": "knows",
"predicates": ["foaf:knows"]
}
],
"joins" : {
"foaf:knows" : {
"fields": [
{
"fieldName":"knows_name",
"predicates":["foaf:name"]
}
]
}
}
}
],
"search_config":{
"search_provider":"MongoSearchProvider",
"search_specifications":[
{
"_id":"i_users",
"type":["foaf:Person"],
"from":"CBD_user",
"to_data_source" : "cluster2",
"filter":[
{
"condition":{
"foaf:name.l":{
"$exists":true
}
}
}
],
"indices":[
{
"fieldName": "name",
"predicates": ["foaf:name", "foaf:firstName","foaf:surname"]
}
],
"fields":[
{
"fieldName":"result.name",
"predicates":["foaf:name"],
"limit" : 1
}
]
}
]
}
}
},
"transaction_log" : {
"database" : "testing",
"collection" : "transaction_log",
"data_source" : "cluster2"
}
}
Data is stored in Mongo collections, one CBD per document. Typically you would choose to put all the data of a given object type in a distinct collection prefixed with CBD_
, e.g. CBD_users
although this is more convention than requirement.
These CBD collections are considered read and write from your application, and are subject to transactions recorded in the tlog (see Transactions below).
A CBD might look like this:
{
"_id" : {
"r" : "http://example.com/user/2",
"c" : "http://example.com/defaultContext"
},
"siocAccess:Role" : {
"l" : "an undergraduate"
},
"siocAccess:has_status" : {
"l" : "public"
},
"spec:email" : {
"l" : "me@example.com"
},
"rdf:type" : [
{
"u" : "foaf:Person"
},
{
"u" : "sioc:User"
}
],
"foaf:name" : {
"l" : "John Smith"
}
}
A brief guide:
_id
is the composite of the subject (r
property for resource) and the named graph (c
property for context) for this CBD.foaf:name
u
(for uri, these are RDF resource object values) or l
(for literal, these are RDF literal object values)MongoDB is only atomic at the document level. Tripod datasets store one CBD per document. Therefore an update to a graph of data can impact 1..n documents.
Tripod maintains a transaction log (tlog) of updates to allow rollback in the case of multi-document writes. It is possible (and recommended) to run this on a separate cluster to your main data. For disaster recovery, You can use the tlog to replay transactions on top of a known-good backup.
In production we run a small 2nd cluster in EC2 which stores up to 7 days of tlog, we prune and flush this periodically to S3.
The majority of the datasets underpinning Talis Aspire, an enterprise SaaS course management system serving 1M students in over 50 universities worldwide, are powered using graph data stored in MongoDB via the Tripod library.
We built tripod when we needed to migrate away from our own in-house proprietary triple store (incidentally built around early versions of Apache JENA).
We've been using it for 2 years in production. Our data volume is > 500M triples over 70 databases on modest 3-node clusters (2 data nodes) with Dell R710 mid-range servers, 12 cores 96Gb RAM, RAID-10 array of non-SSD disks, m1.small arbiter in EC2.
ExtendedGraph
. The internal structure of this object is a relic from the days of Talis' own proprietary triple store and how it used to return data. We bootstrap onto that using the MongoGraph
object to marshal data in and out. This relies heavily on regex and we know that from our own data gathered in the field this is a single point of optimisation that would cut CPU cycles and memory usage. On the bright side it's nice to have such targeted, low hanging fruit to pick.ExtendedGraph
. A lot of the config could be derived from the annotations on the Entity classes themselves. We are now working on an alpha of this, see this issue. It will likely be an additional library that depends on Tripod rather than being merged into the core.We presented on an earlier version at MongoUK 2012. Since that time we have resolved the following todos:
We make use of the excellent ARC and elements of Tripod are based on the Moriarty library, the fruit of some earlier work by Talis to provide a PHP library for Talis' own proprietary cloud triple store (no longer in operation).
The brainchild of kiyanwang and robotrobot @ Talis