Open ddeboer opened 5 months ago
I framed the results using the following frame:
{
"@context": {
"nde": "https://www.netwerkdigitaalerfgoed.nl/def#",
"prov": "http://www.w3.org/ns/prov#",
"schema": "https://schema.org/",
"void": "http://rdfs.org/ns/void#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"classPartition": "void:classPartition",
"class": { "@id": "void:class", "@type": "@id"},
"entities": { "@id": "void:entities", "@type": "xsd:integer" },
"propertyPartition": "void:propertyPartition",
"distinctValues": { "@id": "void:distinctObjects", "@type": "xsd:integer" },
"property": { "@id": "void:property", "@type": "@id"},
"dataDump": "void:dataDump",
"distinctSubjects": { "@id": "void:distinctSubjects", "@type": "xsd:integer" },
"properties": { "@id": "void:properties", "@type": "xsd:integer" },
"sparqlEndpoint": { "@id": "void:sparqlEndpoint", "@type": "@id"},
"triples": { "@id": "void:triples", "@type": "xsd:integer" },
"dateModified": { "@id": "schema:dateModified", "@type": "xsd:dateTime" },
"contentSize": { "@id": "schema:contentSize", "@type": "xsd:integer" },
"vocabularies": { "@id": "void:vocabulary", "@type": "@id", "@container": "@set"},
"wasGeneratedBy": "prov:wasGeneratedBy",
"Activity": "prov:Activity",
"startedAtTime": { "@id": "prov:startedAtTime", "@type": "xsd:dateTime" },
"endedAtTime": { "@id": "prov:endedAtTime", "@type": "xsd:dateTime" },
"distinctValuesLiteral": { "@id": "nde:distinctObjectsLiteral", "@type": "xsd:integer" },
"distinctValuesURI": { "@id": "nde:distinctObjectsURI", "@type": "xsd:integer" },
"type": "@type",
"id": "@id"
},
"@type": "void:Dataset"
}
Here are the results: JSON-LD-Framed.zip
@ddeboer could you review?
Thanks @pmaria, looks good to me!
What strikes me looking at the JSON-LD is that having separate blank nodes for the same class partition looks overly verbose:
"classPartition": [
{
"class": "https://w3id.org/pnv#PersonName",
"entities": "189063"
},
{
"class": "https://w3id.org/pnv#PersonName",
"propertyPartition": {
"distinctValues": "72130",
"entities": "189057",
"property": "https://w3id.org/pnv#baseSurname"
}
},
{
"class": "https://w3id.org/pnv#PersonName",
"propertyPartition": {
"distinctValues": "54198",
"entities": "184846",
"property": "https://w3id.org/pnv#firstName"
}
},
{
"class": "https://w3id.org/pnv#PersonName",
"propertyPartition": {
"distinctValues": "188920",
"entities": "189063",
"property": "http://www.w3.org/2000/01/rdf-schema#label"
}
},
{
"class": "https://w3id.org/pnv#PersonName",
"propertyPartition": {
"distinctValues": "341",
"entities": "34366",
"property": "https://w3id.org/pnv#infix"
}
},
Should we merge those into a single class partition with multiple property partitions?
Of course, this is something that should happen inside the pipeline rather than the conversion to JSON-LD. I’m not sure yet what would be the best way to reference blank nodes between isolated analysers. Should we coin URIs for the analyser output instead?
@ddeboer yes agreed. We could indeed use some repeatable URI generation to group the property partitions under the same class partition.
@ddeboer yes agreed. We could indeed use some repeatable URI generation to group the property partitions under the same class partition.
Done in #66.
Use case: Colonial Heritage.