Open VladimirAlexiev opened 6 years ago
StatDCAT-AP has some props for describing what is inside. I don't know much about it, but found them in a mapping to Schema proposed by the EC:
stat:attribute, dimension, numberOfDataSeries, unitOfMeasurement
I'm also interested in expressing the "number of observations" or "sample size" in schema.org. Users looking for data are interested in this piece of metadata.
What is a sample of observation differs from one dataset to another so I would suggest to disentangle this and allow to specify the number and definition of an observation separately.
See issue #7 for the context of the move from the main Schema.org issue tracker to this repository.
schemaorg/schemaorg#1083 and schemaorg/schemaorg#1471 has http://pending.schema.org/variableMeasured, which describes what measurements/observations are included in a dataset.
(Shameless cc to all people in those discussions: @danbri @darobin @natashafn @akuckartz @joshsh @Aaranged @ypriverol @agbeltran @dr-shorthair @ldodds @rob-metalinkage @KerryLea; and @RichardWallis)
A question about
variableMeasured
: since a dataset would have many observations, can you confirm that thePropertyValue
pointed wouldn't have anyvalue
and could only haveminValue...maxValue
describing the range of included observations?Now for the main question: what properties do we have to describe the number of observations or other size characteristics of a dataset itself? Here are some cases:
void:
count props:triples, entities, classes, properties, distinctSubjects, distinctObjects, documents
. Very importantly, you can use these on subsets such as classPartition and propertyPartition which gives you very powerful means to describe exactly what kinds of things and how many in the datasetdcat:byteSize
which is pretty useless to describe any aspect of the value of the datasetI realize
schema:Dataset
is about any datasets not just RDF. Butvoid:entities
pertains to any dataset, and the idea to be able to describe in a structured way the characteristics of things inside a dataset (VOID's partition subsets) is very powerful.How could we include that in Schema?