Closed lucguillemot closed 5 years ago
Would it be possible to also retrieve all the possible values that a dimension can have?
Sure, I could implement it on Dimension/Attribute/Measure and we'd use it like this:
myDimension.values()
What do you think?
Or is it only useful on Dimensions?
Yes, that sounds perfect. You're right, it would be useful also for attributes and measures.
I think for measures the more interesting query is min/max, as we can potentially have a different value for every single observation.
Also attributes are more often strings than URIs in the real world, while dimensions are often (but not always) URIs.
Speaking of which, we could also do min/max on dimensions that are literals, not URIs, like dates.
From SPARQL point of view, we can figure out if the object is a literal or a URI with FILTER(isLiteral(?propertyValue))
or isIRI()
. There is also isNumeric()
. See spec for details or ask me.
Not sure we'll really need this but: the dimension values can actually change based on the query you execute on a dataset (cube?). For example, if you filter observations by an area, the time dimension could have a different range – in the context of this query.
Now, this behavior would be quite terrible:
await myDimension.values() // => [1,2,3,4,5]
await cube.query().filter(something).execute() // ...
await myDimension.values() // => [3,5]
because we'd run into all kinds of bugs if the internal state of myDimension
would somehow change when queries are executed.
Something like this would probably be better:
// All dimension values of a cube
await cube.values(myDimension) // => [1,2,3,4,5]
// Construct query without executing
const query = cube.query().filter(something)
// All dimension values of this query
await query.values(myDimension) // => [3,5]
Then again, isn't this already solved by
cube.query()
.select({foo: fooDimension.distinct()})
.execute()
Edit : why is .distinct
a method of the dimension even? Shouldn't that rather be:
cube.query()
.select({foo: fooDimension})
.distinct()
.execute()
(I think SPARQL DISTINCT
applies to all bound variables not just single ones?)
Not sure we'll really need this but: the dimension values can actually change based on the query you execute on a dataset (cube?). For example, if you filter observations by an area, the time dimension could have a different range – in the context of this query.
Now, this behavior would be quite terrible:
await myDimension.values() // => [1,2,3,4,5] await cube.query().filter(something).execute() // ... await myDimension.values() // => [3,5]
because we'd run into all kinds of bugs if the internal state of
myDimension
would somehow change when queries are executed.
Executing a query or generating sparql for a query doesn't modify any state, if it does it's a bug. Same with dimensions/attributes/measures: myDimension.distinct()
doesn't mutate myDimension
.
I'll implement something and if the API for getting the values is confusing we'll find a solution that we all like.
@vhf sorry, I wasn't trying to say that querying actually did mutate anything. I was just giving an example of a potentially bad API :)
Getting all values:
const values = await dataCube.query()
.select({ size: sizeClasses })
.filter(({ size }) => size.notEquals("50 - 100 ha"))
.componentValues();
// same results (but not same SPARQL query!) as:
const values = (await dataCube.componentValues(sizeClasses))
.filter((value) => value.label.value !== "50 - 100 ha");
Getting min/max:
// of a Component, no filter:
const timeMinMax = await dataCube.componentMinMax(time);
// more fine grained on Query, with filters:
const { min: sizeMin, max: sizeMax } = await dataCube.query()
.select({ size: sizeDimension })
.filter(({ size }) => size.gt(50))
.filter(({ size }) => size.lte(250))
.componentMinMax();
Published as @zazuko/query-rdf-data-cube@0.0.11
Thank you for this library, it 's extremely useful!
Currently, I can easily retrieve the dimensions of a datacube with
datacube.dimensions()
. Would it be possible to also retrieve all the possible values that a dimension can have? For instance all the spatial units that a spatial dimension can have.A SPARQL query would look something like this I think:
The goal is to then query the observations with a filter value like this: