Guidance on creating blueprint config

OliverWoolland commented 3 weeks ago

Hello! We really like blueprint and have been sharing an instance we are running with the demo data set (the plankton one) and colleagues and managers have loved it.

We have our real dataset stored in a triple store and are able to view it through Trifid etc but we would some advice on how to configure blueprint to show this for us.

Our attempts so far have felt like shots in the dark!

Is there any documentation that could be shared? Alternatively, if the graphical configuration tool is nearly ready would it be best to wait for that? Could we help out with user testing?

BenjaminHofstetter commented 3 weeks ago

hello @OliverWoolland There is no graphical configurator available soon. But we have this https://github.com/zazuko/blueprint-ui-config-initializer Can you run node.js on your computer ? If yes I will add some documentation about how to use it. Is that ok for you?

OliverWoolland commented 3 weeks ago

Thank you @BenjaminHofstetter! I've been able to download and run the tool end to end :)

I'm going to set about having a more serious play now!

OliverWoolland commented 2 weeks ago

I thought I would document my experience so far! If you have any thoughts I'd be very interested to hear them :)

Attempt 1: Reproducing demo config

I thought I would try running the config tool on the demo set so I:

uploaded the demo data to a fuseki datastore
pointed trifid to fuseki
ran config tool on the trifid endpoint (when I ran the config tool on the fuseki sparql endpoint I got syntax errors!)

The config generated looked sane so I moved on.

Attempt 2: Running on our full dataset(s)

Next, I tried the moonshot and ran the config tool on a couple of our datasets that we already have. The tool ran successfully and generated configs, however, while the config seemed successful in picking up the hierarchy of classes, the data itself was not explorable in blueprint.

So, for example, I could view people in the filter list on the search page but no items were listed in the main pane.

Attempt 3: Minimal dataset

To try to understand things better I thought I would try a very stripped back dataset. The base of our datasets are individual RO Crates A minimal example is given here which I converted to a turtle (min.ttl).

I've attached a bundle of the input, intermediate files and generated config!

generated_config.zip

And they resulted in a blueprint config that rendered like this:

foo_060 foo_061

Summary

The tooling seems to work and generate sensible configs. However, for me the configs appear somewhat incomplete and I am unsure exactly how to proceed.

My plan is to dig into the demo data and k8 example sets to see what type of triples are missing but I would appreciate any pushes in the right direction!

Many thanks for the help so far :)

mchlrch commented 2 weeks ago

Thank you for testdriving blueprint, the config-initializer and for sending us your feedback.

Blueprint expects rdfs:labels on the resources. I see in the mini.ttl that you are using schema:name.

Adding explicit rdfs:label statements to your data or using RDFS reasoning of the triplestore (as schema:name is a subproperty of rdfs:label) should give better results.

OliverWoolland commented 2 weeks ago

Thank you @mchlrch! I'm working on that now

I thought I would try adding a reasoner to my Fuseki instance (which I have not tried before) but I'll fallback to something more manual if I get stuck.

I've also noticed #22 so will keep an eye on that!

I'll update here with how I get on in the next day or two :)

OliverWoolland commented 2 weeks ago

I've got a few findings to report I think :) and I am happy to say that I think I am mostly where I need to be now for making a start on showing blueprint off with our real data.

I tried a few different ways to introduce the reasoning of schema:name rdfs:subPropertyOf rdfs:label

Attempt 1: Python Reasoner

First I tried doing offline reasoning using the Python library OWL-RL which was a little tricky to get going and convince to do RDFS reasoning. While this did kinda work it felt clunky as a process and I decided not to continue.

Attempt 2: Jena Fuseki Vocabulary

The next thing I tried was the most successful! I created a simple vocabulary.ttl which I applied in my Jena Fuseki config.

PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema:  <http://schema.org/>
schema:name rdfs:subPropertyOf rdfs:label .

That results in a display that looks like this (for the minimal example):

foo_069 foo_068

And it looked pretty good for larger datasets too.

Attempt 3: Jena Reasoner

I thought I'd try to get away from manual configuration and try out the RDFS reasoner that can be enabled in Jena Fuseki.

I used this config

PREFIX :        <#>
PREFIX fuseki:  <http://jena.apache.org/fuseki#>
PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ja:      <http://jena.hpl.hp.com/2005/11/Assembler#>

[] rdf:type fuseki:Server ;
   fuseki:services (
     :service
   ) .

## Fuseki service /crater with SPARQ query
:service rdf:type fuseki:Service ;
    fuseki:name "crater" ;
    fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name "query" ] ;
    fuseki:endpoint [ fuseki:operation fuseki:query ] ;
    fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name "sparql" ] ;
    fuseki:endpoint [ fuseki:operation fuseki:update ; fuseki:name "update" ] ;
    fuseki:endpoint [ fuseki:operation fuseki:update ] ;
    fuseki:endpoint [ fuseki:operation fuseki:gsp_r ; fuseki:name "get" ];
    fuseki:endpoint [ fuseki:operation fuseki:gsp_rw ];
    fuseki:endpoint [ fuseki:operation fuseki:gsp_rw ; fuseki:name "data" ];
    fuseki:dataset :dataset ;
    .

# Dataset with only the default graph.
:dataset rdf:type       ja:RDFDataset ;
    ja:defaultGraph     :model_inf ;
    .

# The inference model
:model_inf a ja:InfModel ;
     ja:baseModel :baseModel ;
     ja:reasoner [
         ja:reasonerURL <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>
     ] .

# The base data.
:baseModel a ja:MemoryModel ;
    ja:content [ja:externalContent <file:data.ttl> ] ;

which did work and performed a lot of reasoning. With running the blueprint configuration tools as well 13 triples exploded into nearly 1000!

Despite the extra triples I didn't see any improvement over the vocabulary file so I called it a day there.

Attempt 4: Schema.org Vocabulary

The next attempt I made was using schema.org's vocabulary published here in the config described in Attempt 2. I found that at least for my data I had to use the http not https version.

This felt a nice solution as it fleshes out the links between the schema org elements the rdfs well as far as I can tell but resulted in some funny looking representations in blueprint!

foo_067

This screenshot shows how several parameters get repeated when using the full vocabulary. If you have any suggestions on curing this I would be very keen to hear them!

I was wondering if it was possible to solve this on the config tool side of things...? But I am a little unsure! It feels to me like an extra DISTINCT somewhere might help?

Summary

Overall I think I am now in a good starting position for using Blueprint in a more serious way. I have a few outstanding things I'd like to work on but plan on working through them over time!

I have a couple of quick questions though:

Can anything be done about the duplication shown in Attempt 4?
Will I need to keep rerunning the config tool if my dataset evolves significantly? Extra classes or links being introduced for example

Thank you again for all the help and advice! It's been fun to play with this and the promise that I believe blueprint shows makes it feel very much worth investing some time in.

mchlrch commented 2 weeks ago

Hi Oliver, thanks a lot for reporting back!

* Can anything be done about the duplication shown in Attempt 4?

These duplications result from the rdfs:subClassOfstatements in the schemaorg vocabulary:

Thing > CreativeWork > Dataset

This particular resource is a Dataset and after reasoning also a CreativeWork and a Thing. Neither the config tool nor Blueprint itself do any kind of de-duplication at the moment, so one can think of Blueprint ending up pulling the information for each of the three configured classes and showing the "concatenation" of the result. Hence you get duplication.

To clear this up, you could do the following while running the config tool:

Ignore class Thing by commenting it out in classes.ttl
Hoisting the attributes in the class hierarchy, so that you have the following configured in details.ttl:
- For class CreativeWork: description, identifier, name
- For class Dataset: datePublished
Hoisting the links in the class hierarchy ( same approach as for the attributes)
Ignore attribute rdfs:label (in favor of showing schema:name) by commenting it out in details.ttl:

* Will I need to keep rerunning the config tool if my dataset evolves significantly? Extra classes or links being introduced for example

Yes, if you want the extra classes or links to be visible in Blueprint, then you would have to add them to the ui-config. Either by rerunning the config tool or by modifying the ui-config manually.

zazuko / blueprint