zazuko / query-rdf-data-cube

Explore or query RDF Data Cubes with a JavaScript API, without writing SPARQL.
https://zazuko.github.io/query-rdf-data-cube/
9 stars 2 forks source link

Filtering by gYear #32

Open jstcki opened 5 years ago

jstcki commented 5 years ago

Hi @vhf!

A follow-up to our discussion about filtering by years (and probably other date types). While this works fine for non-literal dimensions:

query.filter(dim.equals("http://some.iri.org/foo") //or
query.filter(dim.in(["http://some.iri.org/foo", "http://some.iri.org/bar"])

… it doesn't for xsd:gYear values. For example:

query.filter(dim.equals("2018"))

generates a SPARQL query with

FILTER(?someField = 2018 )

… which leads to no results (see http://yasgui.org/short/VyGPg76bB).

Expected outcome

The expected (working) SPARQL should be

FILTER(?someField = "2018"^^xsd:gYear )

(see http://yasgui.org/short/7QS7K16Nl)

Solution?

So, I think there needs to be a way to provide a typed value to .filter() – but how? I tried this:

query.filter(dim.equals({
  type: "literal",
  datatype:"http://www.w3.org/2001/XMLSchema#gYear",
  value: "2017"
}))

… but that just generates an error in the library:

Error: into() cannot cast arg {“value”:“2017",“type”:“literal”,“datatype”:“http://www.w3.org/2001/XMLSchema#gYear“}
jstcki commented 5 years ago

I played some more with the SPARQL query and quite oddly, these all return the expected results (notice the use of >=, with = none of these work I think):

(?xField >= "2017"^^xsd:date ) // Returns values for 2017 and 2018
(?xField >= "2017"^^xsd:dateTime ) // Returns values for 2017 and 2018
(?xField >= "2017"^^xsd:foobar ) // <=== WHAT? Also returns values for 2017 and 2018

but I'm really not sure why?

Also, these do not work:

(?xField >= "2017" ) // Only returns 2018 values
(?xField >= "2017"^^xsd:whatever ) // <=== WHAT? Only returns 2018 values
vhf commented 5 years ago

Solution?

So, I think there needs to be a way to provide a typed value to .filter() – but how? I tried this:

Here's how, and I understand it would have been hard to come up with it, it's quite rdfjs specific:

You have to construct a literal to compare to: year.equals(literal("2017", namedNode("http://www.w3.org/2001/XMLSchema#gYear"))

Full example:

  const entryPoint = new DataCubeEntryPoint(
    "https://trifid-lindas.test.cluster.ldbar.ch/query",
    { languages: ["de"] },
  );
  const dataCubes = await entryPoint.dataCubes();

  const cube = dataCubes[0];
  const dimensions = await cube.dimensions();
  const year = dimensions[0];
  const forstzone = dimensions[1];
  const kanton = dimensions[2];

  const results = await cube.query()
    .select({
      year,
      forstzone,
      kanton,
    })
    .filter(year.equals(literal("2017", namedNode("http://www.w3.org/2001/XMLSchema#gYear"))))
    .execute();

You should be able to import literal and namedNode because the lib depends on @rdfjs/data-model, try this:

import { literal, namedNode } from "@rdfjs/data-model";
jstcki commented 5 years ago

Thanks, I'll give it a shot.

jstcki commented 5 years ago

OK, it works. But … It's quite brittle IMO.

  1. We can't just rely on extraMetadata.scaleOfMeasure because even if we know it's a temporal dimension, we don't know which data type is used for dimension values.
  2. In order to construct the right data type, I would need to infer it from one of the dimension's min/max values, which is metadata fetched asynchronously, so I'd have to cache this info myself somewhere which is very tedious.
  3. Ideally, every Dimension would also have a (non-optional) value data type (extraMetadata.scaleOfMeasure doesn't work because it doesn't tell us the exact type we'd have to use for filtering – also it's kinda optional)
  4. Even better, if the value data type was known by the dimension, year.equals("2017") could just work and constructing typed literals would be handled by the library itself. I think this would be a good approach from an API perspective.

P.S. Maybe even this could work: year.equals(new Date(2018, 0, 1)) because if the value type for the year dimension is gYear it could then just use date.getFullYear() internally. But probably this is not really too useful.

jstcki commented 5 years ago

FYI, I'm going to implement 2. from my previous comment because it's actually useful to know dimension values/minmax in a few places, not just for filtering. So no hurry for something like 4. (which I still think is nicer 😉 ).

ktk commented 5 years ago

We definitely need to invest time into the representation of time in the cube model but that will have to wait for next year I'm afraid.

As a workaround, could we use the datatype for the literal that we have in the shape @vhf ? It is not directly attached to the current qb:MeasureProperty|qb:DimensionProperty but I might add that as I did with the scaleOfMeasure as a workaround. Like this the lib would simply attach it properly to the FILTER and then the generated SPARQL should IMO be correct.

vhf commented 5 years ago

As a workaround, could we use the datatype for the literal that we have in the shape @vhf ?

Unfortunately not. The best we can do at the moment is a workaround like scaleOfMeasure.

ktk commented 5 years ago

Yes that's what I mean so I could add it as rdfs:range to the Dimension/Measure.

jstcki commented 4 years ago

@ktk I noticed that while filtering by xsd:gYear works fine, filtering with xsd:gYearMonth does not.

E.g. FILTER(?dim0 = "2006-09"^^xsd:gYearMonth). doesn't work.

This works: (?dim0 >= "2006-09"^^xsd:gYearMonth) && (?dim0 <= "2006-09"^^xsd:gYearMonth)

I suspect this is some particularity of SPARQL itself or Stardog's implementation but I wonder if there is a good way to handle these issues.

Example query)+%7D+OPTIONAL+%7B+%3Fdim0+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3Fdim0Label_de.+FILTER(LANGMATCHES(LANG(%3Fdim0Label_de)%2C+%22de%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fdim0+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3Fdim0Label_fr.+FILTER(LANGMATCHES(LANG(%3Fdim0Label_fr)%2C+%22fr%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fdim0+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3Fdim0Label_it.+FILTER(LANGMATCHES(LANG(%3Fdim0Labelit)%2C+%22it%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fdim0+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3Fdim0Label.+FILTER((LANG(%3Fdim0Label_))+%3D+%22%22%5E%5Exsd%3Astring)+%7D+BIND(COALESCE(%3Fdim0Label_en%2C+%3Fdim0Label_de%2C+%3Fdim0Label_fr%2C+%3Fdim0Labelit%2C+%3Fdim0Label%2C+%22%22%5E%5Exsd%3Astring)+AS+%3Fdim0Label)+OPTIONAL+%7B+%3Fdim2+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3Fdim2Label_en.+FILTER(LANGMATCHES(LANG(%3Fdim2Label_en)%2C+%22en%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fdim2+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3Fdim2Label_de.+FILTER(LANGMATCHES(LANG(%3Fdim2Label_de)%2C+%22de%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fdim2+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3Fdim2Label_fr.+FILTER(LANGMATCHES(LANG(%3Fdim2Label_fr)%2C+%22fr%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fdim2+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3Fdim2Label_it.+FILTER(LANGMATCHES(LANG(%3Fdim2Labelit)%2C+%22it%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fdim2+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3Fdim2Label.+FILTER((LANG(%3Fdim2Label_))+%3D+%22%22%5E%5Exsd%3Astring)+%7D+BIND(COALESCE(%3Fdim2Label_en%2C+%3Fdim2Label_de%2C+%3Fdim2Label_fr%2C+%3Fdim2Labelit%2C+%3Fdim2Label%2C+%22%22%5E%5Exsd%3Astring)+AS+%3Fdim2Label)+OPTIONAL+%7B+%3Fx+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3FxLabel_en.+FILTER(LANGMATCHES(LANG(%3FxLabel_en)%2C+%22en%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fx+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3FxLabel_de.+FILTER(LANGMATCHES(LANG(%3FxLabel_de)%2C+%22de%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fx+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3FxLabel_fr.+FILTER(LANGMATCHES(LANG(%3FxLabel_fr)%2C+%22fr%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fx+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3FxLabel_it.+FILTER(LANGMATCHES(LANG(%3FxLabelit)%2C+%22it%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fx+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3FxLabel.+FILTER((LANG(%3FxLabel_))+%3D+%22%22%5E%5Exsd%3Astring)+%7D+BIND(COALESCE(%3FxLabel_en%2C+%3FxLabel_de%2C+%3FxLabel_fr%2C+%3FxLabelit%2C+%3FxLabel%2C+%22%22%5E%5Exsd%3Astring)+AS+%3FxLabel)+OPTIONAL+%7B+%3Fsegment+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3FsegmentLabel_en.+FILTER(LANGMATCHES(LANG(%3FsegmentLabel_en)%2C+%22en%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fsegment+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3FsegmentLabel_de.+FILTER(LANGMATCHES(LANG(%3FsegmentLabel_de)%2C+%22de%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fsegment+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3FsegmentLabel_fr.+FILTER(LANGMATCHES(LANG(%3FsegmentLabel_fr)%2C+%22fr%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fsegment+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3FsegmentLabel_it.+FILTER(LANGMATCHES(LANG(%3FsegmentLabelit)%2C+%22it%22%5E%5Exsd%3Astring))+%7D+OPTIONAL+%7B+%3Fsegment+(rdfs%3Alabel%7Cskos%3AprefLabel)+%3FsegmentLabel.+FILTER((LANG(%3FsegmentLabel_))+%3D+%22%22%5E%5Exsd%3Astring)+%7D+BIND(COALESCE(%3FsegmentLabel_en%2C+%3FsegmentLabel_de%2C+%3FsegmentLabel_fr%2C+%3FsegmentLabelit%2C+%3FsegmentLabel%2C+%22%22%5E%5Exsd%3Astring)+AS+%3FsegmentLabel)+FILTER((%3Fdim0+%3D+%222006-06%22%5E%5Exsd%3AgYearMonth)+%26%26+(%3Fdim2+%3D+%3Chttps%3A%2F%2Fld.stadt-zuerich.ch%2Fstatistics%2Fcode%2FBEB1004%3E))+%7D&contentTypeConstruct=text%2Fturtle&contentTypeSelect=application%2Fsparql-results%2Bjson&endpoint=https%3A%2F%2Ftrifid-lindas.test.cluster.ldbar.ch%2Fquery&requestMethod=POST&tabTitle=Query+1&headers=%7B%7D&outputFormat=table)

jstcki commented 4 years ago

Oddly, this works too: (?dim0 IN ("2006-06"^^xsd:gYearMonth)), so as a workaround I'm switching all filters to use IN instead of using =.