size characteristics of a Dataset

VladimirAlexiev commented 6 years ago

schemaorg/schemaorg#1083 and schemaorg/schemaorg#1471 has http://pending.schema.org/variableMeasured, which describes what measurements/observations are included in a dataset.

(Shameless cc to all people in those discussions: @danbri @darobin @natashafn @akuckartz @joshsh @Aaranged @ypriverol @agbeltran @dr-shorthair @ldodds @rob-metalinkage @KerryLea; and @RichardWallis)

A question about variableMeasured: since a dataset would have many observations, can you confirm that the PropertyValue pointed wouldn't have any value and could only have minValue...maxValue describing the range of included observations?

Now for the main question: what properties do we have to describe the number of observations or other size characteristics of a dataset itself? Here are some cases:

VOID statistics includes these void: count props: triples, entities, classes, properties, distinctSubjects, distinctObjects, documents. Very importantly, you can use these on subsets such as classPartition and propertyPartition which gives you very powerful means to describe exactly what kinds of things and how many in the dataset
DCAT size has only dcat:byteSize which is pretty useless to describe any aspect of the value of the dataset
In the euBusinessGraph project we have a need to describe Company datasets by different providers, what properties are included (eg ebg:isStartup, org:orgActivity), and some partitions (eg "the dataset covers jurisdiction Italy" or "the dataset has 1000 companies with ebg:isStartup true in Italy")

I realize schema:Dataset is about any datasets not just RDF. But void:entities pertains to any dataset, and the idea to be able to describe in a structured way the characteristics of things inside a dataset (VOID's partition subsets) is very powerful.

How could we include that in Schema?

VladimirAlexiev commented 6 years ago

StatDCAT-AP has some props for describing what is inside. I don't know much about it, but found them in a mapping to Schema proposed by the EC:

stat:attribute, dimension, numberOfDataSeries, unitOfMeasurement

chrisgorgo commented 5 years ago

I'm also interested in expressing the "number of observations" or "sample size" in schema.org. Users looking for data are interested in this piece of metadata.

What is a sample of observation differs from one dataset to another so I would suggest to disentangle this and allow to specify the number and definition of an observation separately.

RichardWallis commented 4 years ago

See issue #7 for the context of the move from the main Schema.org issue tracker to this repository.

schemaorg / suggestions-questions-brainstorming

size characteristics of a Dataset #160