Open ptbrowne opened 3 years ago
We need to also expand the "schema:name" of some of the predicates of the Cube Metadata.
I think for an out-of-the-box feature that's a little bit out of scope for this library. But I will write you an example that shows how to use the SPARQL client from a Source
and how to convert the result into a TermMap with the cube IRI as key and the required properties in a value object.
that's a little bit out of scope for this library.
I don't understand why this out of scope for this library. The cube creator and visualize admin have the need to fetch themes/organizations with their name, I feel like we are duplicating code between the two for those usecases.
In the README, we have
For any other RDF data, the LookupSource must be used, which extends Source. The difference to a plain Source is the more precise definition of the usage.
From the definition, it seems like fetching related organizations and themes could be done with this class, I feel like it already does this job for dimensions.
I agree; there should be a way to use lookup sources to "join" on cube metadata, not just observations.
For example, we can list the organizations attached to a cube but don't have their names available:
const cube = await source.cube(iri);
cube.out(dcterms.creator).values // → get a set of named nodes (IRIs)
cube.out(dcterms.creator).out(schema.name).values // → obviously doesn't return anything
Not having a way to include the metadata lookup in source.cube(iri)
is inefficient, as it needs another query and inconvenient, since we need to write a SPARQL query by hand for this, which makes this library a quite leaky abstraction on top of SPARQL and your Cube model.
The cube metadata is not an observation. The LookupSource
requires a cube to make a join with the observations of the cube.
The label of some concept connected to the cube is not part of the cube model. Cubes in the LINDAS store may use the triple to store the information in the SPARQL store, but somewhere else, a cube may use an IRI that is not stored in the same SPARQL store and requires dereferencing to fetch the label. This library should not become a generic RDF library. The focus of this library is cubes, views, and observations. But it's possible to access the underlying objects to access the same SPARQL store. The logic for LINDAS cubes should be part of a LINDAS related application and can be built upon the mentioned objects.
An IRI referenced by an observation can also refer to something not stored in the same store (see #63), so the problem you're describing about dereferencing seems to be the same on observation level.
And, since there's no way to access the cubes query itself (see #60), there's no way to "access the underlying objects" on a cube level, no?
Again, this is not about having LINDAS-specific features built-in (although one could also argue that there are already such features like support for version history) but an API to extend the cubes query, so it's actually possible to build something specific for LINDAS. If that's already possible in a way, then some guidance on how to do it would be much appreciated.
@herrstucki sorry for the late answer. We discussed this internally, and we decided to split the part of cube discovery out of this library, which should focus on the cube-views per se.
So we will not remove any current functionality, but for the future we will create a new library for the purpose of discovering and listing cubes. My question is now, as for now you solved this, how high is your priority for such a library?
Thanks @l00mi. No immediate need, as we've already worked around the issue but having some idea on when you're planning to do this would be useful. I'd also encourage you to involve actual users (like the Visualize team) in the design phase, so nothing important gets missed.
but for the future we will create a new library for the purpose of discovering and listing cubes.
To be honest, I feel like the number of different libraries that we have to touch to query cubes/observations and their metadata is already very high. Here is an extract from our package.json listing all the rdf related libraries:
As much as I like libraries to do one job and one job well, I feel like here it is actually detrimental to the developer experience as it is difficult to make sense of it all. Ideally, I'd like to have some central library that ties it all together. If another library is to be made, I think it'd be good to have at least a central documentation where it is explained what is the expected way of connecting all these things.
I partly agree on your statement. But be also aware, this is a technology stack like others, so some of the libraries are just basic helpers for the RDF stack. We do not plan to fully abstract, or boilerplate the full RDF stack, because this will take away a lot of the added value of the graph model. We do work on a base library for RDF in general see https://www.npmjs.com/package/@rdfjs/environment which provide the basic stack.
Again on the point of having an ever better documentation on how to use the stack and tie it together we always work, and are also very happy for help.
Finally, at some point we might tie together everything necessary in a lindas-cube library. But for now we are still learning what we all need in the specific parts, so this will take time. And you are actually helping us to walk this way to find out what is necessary to add and work with this data in a dynamic way.
(Disclaimer: I just started diving in in the RDF world, please excuse me if I say silly things ☺️)
In visualize admin, we would like to show the cubes with their respective themes / categories. Since those are linked nodes, we would need to fetch the categories / themes and make a "join". Before diving more in the subject, I have several questions :