Specify language - Githubissues

zazuko / query-rdf-data-cube

Explore or query RDF Data Cubes with a JavaScript API, without writing SPARQL.

https://zazuko.github.io/query-rdf-data-cube/

9 stars 2 forks source link

Specify language #8

Closed jstcki closed 5 years ago

jstcki commented 5 years ago

You're probably already aware of this but I'll mention it anyway that the user should be able to filter for label (other things too?) language. Preferably somewhere near the top of the query (ie. cube.datasets({lang: "de"}), since one probably doesn't want to repeat this all the way down.

For example, I used FILTER(langmatches(lang(?value), "de") || lang(?value) = "") but maybe that's not completely sufficient.

ktk commented 5 years ago

I think that should be enough, I did pretty much this so far. I did not think of || though, that makes it more flexible than what I did!

jstcki commented 5 years ago

I got this from the example (BAFU) queries at http://data.admin.ch/datasets/

But … using || might not be the best idea since you can potentially match multiple values, no? So if I want to make sure to only get one, it would have to be done like in the example you linked @ktk (which seems quite tedious, since all values can have a language tag, no?)

It would be neat to coalesce values inline without having to name the variables explicitly, e.g. like

COALESCE(FILTER(LANGMATCHES(LANG(?value), "de"), FILTER(...))

… but I digress ;)

vhf commented 5 years ago

That's definitely a missing feature!

Preferably somewhere near the top of the query (ie. cube.datasets({lang: "de"}), since one probably doesn't want to repeat this all the way down.

I'm thinking about putting it in several places. DataCube is one since it fetches labels, but I expect most people to instantiate a DataCube once and to then query it several times. For this reason I'd also add this arg to DataSet#query (https://github.com/zazuko/query-rdf-data-cube/blob/36ecf65490a4a136a061c2cbf4d9d89fd264e02f/examples/introspect-and-query.js#L48-L49). The query would default to using its DataCube lang and in case you don't care about the datasets labels when you are querying a dataset you could then override the lang just for the query results.

jstcki commented 5 years ago

Yeah, having it in less/more specific places sounds good!

jstcki commented 5 years ago

Another thing (I saw that you started working on this in #13 🎉 ):

Would it be sensible/possible that labels in different languages are grouped by the node they belong to?

For example: currently, if multiple languages are present, I get a separate DataSet for each language but all with the same IRI. And since just the label's value is present on the DataSet, it's impossible to know which version it is. For metadata, the full Term is returned incl. language info.

I imagine that it would be more convenient to use if the separate DataSets would be merged into one (based on the same IRI) with label being an array of Terms.

// Instead of:
{ iri: "foo.bar/baz", label: "Hello", ... }, // lang = ???
{ iri: "foo.bar/baz", label: "Bonjour", ... }

// This:
{ iri: "foo.bar/baz", label: [ { value: "Hello", language: "en" }, { value: "Bonjour", language: "fr"} ], ... }

What do you think?

vhf commented 5 years ago

Yep, makes sense! Out of curiosity, what datasets are you testing with? If you have a public endpoint with multi language datasets that would be help me :)

jstcki commented 5 years ago

Yes, I'm using https://trifid-lindas.test.cluster.ldbar.ch/query

vhf commented 5 years ago

I still need to fix a few things with the language support. Introspecting datacubes/datasets/dimensions should be fine, getting data with .query().execute() might return strange results for the moment, I'll come back to this.

new DataCube(endpoint, { languages: ['en', 'fr', …])
new DataSet(endpoint, { languages: ['en', 'fr', …], iri, graphIri, … )

(v0.0.4)

jstcki commented 5 years ago

@vhf amazing, thanks! Looks great, we'll try out the new features today :)