google / schemarama

Schemarama is a project exploring standards-based validation for structured data, especially Schema.org.
Apache License 2.0
124 stars 22 forks source link

Restore demos, document their config API and how to set up static-hosted and docker installations #34

Open danbri opened 2 years ago

danbri commented 2 years ago

We should have documentation showing how to set this up for (a) static serving (b) docker serving, and a live installation of at least one of these.

Background

The original demos used server side processing for 3 things:

The whole thing can be run as a docker container but it would be good to have a simplified pure static version that could be run by anyone very easily. To do this:

I made a first attempt at the documentation below.

Draft documentation

Config API

SchemaramaJS configures itself with various files loaded from relative URIs:

It will also typically serve icons associated with the hierarchy of services, e.g. initial demo uses:

Config details

The original demo shows a mix of shapes - some basic structures from Schema.org's definitions, and some associated with example online services. SchemaramaJS will try to load these upon initialization.

/shacl/shapes

This can be quite large, e.g. looking at headers using

curl -s -D - -o /dev/null http://127.0.0.1:3002/shacl/shapes

Content-Disposition: inline; filename=full.shacl
Content-Type: application/octet-stream
Content-Length: 223194

We get a large dump of SHACL in RDF/Turtle syntax.

/shacl/shex

Similarly, here we are served (in demo configuration):

HTTP/1.0 200 OK
Content-Disposition: inline; filename=full.shexj
Content-Type: application/octet-stream
Content-Length: 633692
Last-Modified: Wed, 09 Mar

Similarly, for the ShEx version we get a large dump of ShEx in ShExJ syntax.

/shacl/subclasses

curl -s -D - http://127.0.0.1:3002/shacl/subclasses

This data file reproduces rdfs:subClassOf assertions from relevant schemas. It is in Turtle format, and is not tightly linked to SHACL, except by the fact that only the SHACL validator uses it; it is not passed to ShEx validator during setup. In principle it could be used for other purposes, and we could change the file/url path accordingly.

In demo configuration, it is every subtype-supertype relationship defined in schema.org (and therefore note sometimes a type has multiple supertypes). Here are the lines relating to the ComedyClub type:

curl -s -D - http://127.0.0.1:3002/shacl/subclasses | grep ComedyClub

schema:ComedyClub rdfs:subClassOf schema:Place .
schema:ComedyClub rdfs:subClassOf schema:EntertainmentBusiness .
schema:ComedyClub rdfs:subClassOf schema:Organization .
schema:ComedyClub rdfs:subClassOf schema:LocalBusiness .
schema:ComedyClub rdfs:subClassOf schema:Thing .

/hierarchy

SchemaramaJS loads a JSON configuration file defining a hierarchy of services/applications that can be associated with the various validations being checked. In turn this file can include image URLs.

Demo config is this:

{
  "nested": [
    {
      "service": "ServiceA"
    },
    {
      "nested": [
        {
          "service": "ServiceBProduct1"
        },
        {
          "service": "ServiceBProduct2"
        },
        {
          "service": "ServiceBProduct3"
        }
      ],
      "service": "ServiceB"
    },
    {
      "service": "ServiceC"
    },
    {
      "service": "ServiceD"
    }
  ],
  "service": "Schema"
}

/services/map

SchemaramaJS also uses a JSON service mapping file, which associates validation shapes (named in common across SHACL and ShEX) with the services described in /services:

{
  "ValidSchemaAboutPage": "Schema",
  "ValidSchemaAcceptAction": "Schema",
  "ValidSchemaAccommodation": "Schema",
  "ValidSchemaAccountingService": "Schema",
  "ValidSchemaAchieveAction": "Schema",
  "ValidSchemaAction": "Schema",
  "ValidSchemaActionAccessSpecification": "Schema",
  "ValidSchemaActionStatusType": "Schema",
  "ValidSchemaActivateAction": "Schema",
  "ValidSchemaAddAction": "Schema",
  "ValidSchemaAdministrativeArea": "Schema",
  "ValidSchemaAdultEntertainment": "Schema",
  "ValidSchemaAggregateOffer": "Schema",
  "ValidSchemaAgreeAction": "Schema",
  "ValidSchemaAirline": "Schema",
  "ValidSchemaAirport": "Schema", [...etc etc...]
  "ValidSchemaWriteAction": "Schema",
  "ValidSchemaXPathType": "Schema",
  "ValidSchemaZoo": "Schema",
  "ValidServiceBRecipe": "ServiceB",
  "ValidServiceBProduct1Recipe": "ServiceBProduct1",
  "ValidServiceBProduct2Recipe": "ServiceBProduct2",
  "ValidServiceBProduct3Recipe": "ServiceBProduct3",
  "ValidServiceARecipe": "ServiceA",
  "ValidServiceDRecipe": "ServiceD",
  "ValidServiceCRecipe": "ServiceC" 
}

/tests

Finally, SchemaramaJS loads a collection of example tests, each is an appropriately escaped text value, structured in a very plain JSON file:

{ 
  "tests": [ 
     "escaped markup here e.g. json-ld...", 
     "second example here e.g. microdata..." 
  ]
}

No additional metadata is included; SchemaramaJS will try to figure out how to parse it.

Config-using Validator code

These files are all loaded by static/js/scc/core.js:

$(document).ready(async () => {
    $.getJSON("https://api.ipify.org/?format=json", function(e) {
        ip = e.ip;
    });
    await $.get(`shacl/shapes`, (res) => shaclShapes = res);
    await $.get(`shacl/subclasses`, (res) => subclasses = res);
    await $.get(`shex/shapes`, (res) => shexShapes = JSON.parse(res));
    await $.get(`hierarchy`, (res) => {
        hierarchy = res;
        constructHierarchySelector(hierarchy, 0);
    });
    await $.get(`services/map`, (res) => shapeToService = res);
    $.get(`tests`, (res) => initTests(res.tests));
    shexValidator = new schemarama.ShexValidator(shexShapes, {annotations: annotations});
    shaclValidator = new schemarama.ShaclValidator(shaclShapes, {
        annotations: annotations,
        subclasses: subclasses,
    });
});
danbri commented 2 years ago

Started a rough script that copies things into the right place in an ephemeral "_serving" folder.

danbri commented 2 years ago

Possible diagnosis and fix for this not running: we're using very simple static HTTP servers that aren't sending the right media type headers for things that are in JSON (or any format for that matter).

I tried

    $.get(`tests`, (res) => { 
        let jres = $.parseJSON(res);    
        initTests(jres.tests)
    });

... in core.js line 39 and it seems to work.

Another gotcha, the demo assumes at least 3 tests will be sent from /tests, currently.