zazuko / query-rdf-data-cube

Explore or query RDF Data Cubes with a JavaScript API, without writing SPARQL.
https://zazuko.github.io/query-rdf-data-cube/
9 stars 2 forks source link

Ability to query all possible values of a dimension #20

Closed lucguillemot closed 5 years ago

lucguillemot commented 5 years ago

Thank you for this library, it 's extremely useful!

Currently, I can easily retrieve the dimensions of a datacube with datacube.dimensions(). Would it be possible to also retrieve all the possible values that a dimension can have? For instance all the spatial units that a spatial dimension can have.

A SPARQL query would look something like this I think:

SELECT DISTINCT ?value WHERE {
      ?obs a qb:Observation;
        <dimension>/rdfs:label ?value .
    }

The goal is to then query the observations with a filter value like this:

datacube
  .query()
  .filter(dimension.equals(dimensionValue))
ktk commented 5 years ago

For reference, this is this part in my initial document

vhf commented 5 years ago

Would it be possible to also retrieve all the possible values that a dimension can have?

Sure, I could implement it on Dimension/Attribute/Measure and we'd use it like this:

myDimension.values()

What do you think?

Or is it only useful on Dimensions?

lucguillemot commented 5 years ago

Yes, that sounds perfect. You're right, it would be useful also for attributes and measures.

ktk commented 5 years ago

I think for measures the more interesting query is min/max, as we can potentially have a different value for every single observation.

Also attributes are more often strings than URIs in the real world, while dimensions are often (but not always) URIs.

ktk commented 5 years ago

Speaking of which, we could also do min/max on dimensions that are literals, not URIs, like dates.

ktk commented 5 years ago

From SPARQL point of view, we can figure out if the object is a literal or a URI with FILTER(isLiteral(?propertyValue)) or isIRI(). There is also isNumeric(). See spec for details or ask me.

jstcki commented 5 years ago

Not sure we'll really need this but: the dimension values can actually change based on the query you execute on a dataset (cube?). For example, if you filter observations by an area, the time dimension could have a different range – in the context of this query.

Now, this behavior would be quite terrible:

await myDimension.values() // => [1,2,3,4,5]

await cube.query().filter(something).execute() // ...

await myDimension.values() // => [3,5]

because we'd run into all kinds of bugs if the internal state of myDimension would somehow change when queries are executed.

Something like this would probably be better:

// All dimension values of a cube
await cube.values(myDimension) // => [1,2,3,4,5]

// Construct query without executing
const query = cube.query().filter(something)

// All dimension values of this query
await query.values(myDimension) // => [3,5]
jstcki commented 5 years ago

Then again, isn't this already solved by

cube.query()
  .select({foo: fooDimension.distinct()})
  .execute()

Edit : why is .distinct a method of the dimension even? Shouldn't that rather be:

cube.query()
  .select({foo: fooDimension})
  .distinct()
  .execute()

(I think SPARQL DISTINCT applies to all bound variables not just single ones?)

vhf commented 5 years ago

Not sure we'll really need this but: the dimension values can actually change based on the query you execute on a dataset (cube?). For example, if you filter observations by an area, the time dimension could have a different range – in the context of this query.

Now, this behavior would be quite terrible:

await myDimension.values() // => [1,2,3,4,5]

await cube.query().filter(something).execute() // ...

await myDimension.values() // => [3,5]

because we'd run into all kinds of bugs if the internal state of myDimension would somehow change when queries are executed.

Executing a query or generating sparql for a query doesn't modify any state, if it does it's a bug. Same with dimensions/attributes/measures: myDimension.distinct() doesn't mutate myDimension.

I'll implement something and if the API for getting the values is confusing we'll find a solution that we all like.

jstcki commented 5 years ago

@vhf sorry, I wasn't trying to say that querying actually did mutate anything. I was just giving an example of a potentially bad API :)

vhf commented 5 years ago

Getting all values:

const values = await dataCube.query()
  .select({ size: sizeClasses })
  .filter(({ size }) => size.notEquals("50 - 100 ha"))
  .componentValues();

// same results (but not same SPARQL query!) as:

const values = (await dataCube.componentValues(sizeClasses))
  .filter((value) => value.label.value !== "50 - 100 ha");

Getting min/max:

// of a Component, no filter:
const timeMinMax = await dataCube.componentMinMax(time);

// more fine grained on Query, with filters:
const { min: sizeMin, max: sizeMax } = await dataCube.query()
  .select({ size: sizeDimension })
  .filter(({ size }) => size.gt(50))
  .filter(({ size }) => size.lte(250))
  .componentMinMax();
vhf commented 5 years ago

Published as @zazuko/query-rdf-data-cube@0.0.11