netwerk-digitaal-erfgoed / dataset-knowledge-graph

Pipeline that generates the NDE Dataset Knowledge Graph
European Union Public License 1.2
2 stars 0 forks source link

UI proof of concept #65

Open ddeboer opened 5 months ago

ddeboer commented 5 months ago

Use case: Colonial Heritage.

pmaria commented 5 months ago

I framed the results using the following frame:

{
  "@context": {
    "nde": "https://www.netwerkdigitaalerfgoed.nl/def#",
    "prov": "http://www.w3.org/ns/prov#",
    "schema": "https://schema.org/",
    "void": "http://rdfs.org/ns/void#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "classPartition": "void:classPartition",
    "class": { "@id": "void:class", "@type": "@id"},
    "entities": { "@id": "void:entities", "@type": "xsd:integer" },
    "propertyPartition": "void:propertyPartition",
    "distinctValues":  { "@id": "void:distinctObjects", "@type": "xsd:integer" },
    "property": { "@id": "void:property", "@type": "@id"},
    "dataDump": "void:dataDump",
    "distinctSubjects":  { "@id": "void:distinctSubjects", "@type": "xsd:integer" },
    "properties":  { "@id": "void:properties", "@type": "xsd:integer" },
    "sparqlEndpoint": { "@id": "void:sparqlEndpoint", "@type": "@id"},
    "triples":  { "@id": "void:triples", "@type": "xsd:integer" },

    "dateModified": { "@id": "schema:dateModified", "@type": "xsd:dateTime" },
    "contentSize": { "@id": "schema:contentSize", "@type": "xsd:integer" },
    "vocabularies": { "@id": "void:vocabulary", "@type": "@id", "@container": "@set"},

    "wasGeneratedBy": "prov:wasGeneratedBy",
    "Activity": "prov:Activity",
    "startedAtTime": { "@id": "prov:startedAtTime", "@type": "xsd:dateTime" },
    "endedAtTime": { "@id": "prov:endedAtTime", "@type": "xsd:dateTime" },

    "distinctValuesLiteral": { "@id": "nde:distinctObjectsLiteral", "@type": "xsd:integer" },
    "distinctValuesURI": { "@id": "nde:distinctObjectsURI", "@type": "xsd:integer" },

    "type": "@type",
    "id": "@id"
  },
  "@type": "void:Dataset"
}

Here are the results: JSON-LD-Framed.zip

pmaria commented 5 months ago

@ddeboer could you review?

ddeboer commented 5 months ago

Thanks @pmaria, looks good to me!

What strikes me looking at the JSON-LD is that having separate blank nodes for the same class partition looks overly verbose:

  "classPartition": [
    {
      "class": "https://w3id.org/pnv#PersonName",
      "entities": "189063"
    },
    {
      "class": "https://w3id.org/pnv#PersonName",
      "propertyPartition": {
        "distinctValues": "72130",
        "entities": "189057",
        "property": "https://w3id.org/pnv#baseSurname"
      }
    },
    {
      "class": "https://w3id.org/pnv#PersonName",
      "propertyPartition": {
        "distinctValues": "54198",
        "entities": "184846",
        "property": "https://w3id.org/pnv#firstName"
      }
    },
    {
      "class": "https://w3id.org/pnv#PersonName",
      "propertyPartition": {
        "distinctValues": "188920",
        "entities": "189063",
        "property": "http://www.w3.org/2000/01/rdf-schema#label"
      }
    },
    {
      "class": "https://w3id.org/pnv#PersonName",
      "propertyPartition": {
        "distinctValues": "341",
        "entities": "34366",
        "property": "https://w3id.org/pnv#infix"
      }
    },

Should we merge those into a single class partition with multiple property partitions?

Of course, this is something that should happen inside the pipeline rather than the conversion to JSON-LD. I’m not sure yet what would be the best way to reference blank nodes between isolated analysers. Should we coin URIs for the analyser output instead?

pmaria commented 5 months ago

@ddeboer yes agreed. We could indeed use some repeatable URI generation to group the property partitions under the same class partition.

ddeboer commented 4 months ago

@ddeboer yes agreed. We could indeed use some repeatable URI generation to group the property partitions under the same class partition.

Done in #66.