rdfjs / query-spec

RDF/JS: Query specification 1.0 – This specification provides provides a common interface for RDF query engine in JavaScript
https://rdf.js.org/query-spec/
6 stars 0 forks source link

Queryable and Filterable typings #7

Closed jacoscaz closed 2 years ago

jacoscaz commented 2 years ago

@gsvarovsky, @rubensworks and I have been bouncing ideas for a little while about a new RDF/JS specification aimed at standardizing the API of query engines willing to adopt the RDF/JS specs.

This PR aims to extend the conversation to the entire RDF/JS group, using what we have produced so far as a starting point for further discussion, changes and improvements.

On top of the work done on our respective projects, the proposed spec is also informed by the preliminary feedback we have received in https://github.com/rdfjs/query-spec/issues/5. Thanks to all who pitched in!

jacoscaz commented 2 years ago

Should we vote with the thumbs-up emoji on everything we agree with and comment on everything we don't?

jacoscaz commented 2 years ago

The conversation has been silent for a while, I assume there are no inbound comments and suggestions for the time being. I'll do a first round of edits based on the above.

jacoscaz commented 2 years ago

Another round of updates done based on all the suggestions in this thread. I'm resolving issues as I go but feel free to reopen if needed.

jacoscaz commented 2 years ago

Should we merge this into master and let the conversation continue in dedicated issues / PRs or is it too soon to fragment like that?

jacoscaz commented 2 years ago

I like the how consistent this is shaping up to be. Minor naming issues:

@rubensworks @gsvarovsky ?

jacoscaz commented 2 years ago

@rubensworks I think I agree in principle but disagree in practice.

I would however align to Query instead of Queryable where possible.

I think this would actually make things worse. We'd end up using Query to both a) refer to a generic concept of a query operation, such as in QueryOperationCost, and b) refer to an actual "instance" of a query at the engine level, such as in the current QueryableContext which would become QueryContext.

True. Since we're using the Future design pattern here, perhaps we can call them QueryResultFuture... or something like that?

That still sounds semantically wrong to me. Intuitively, seeing something like <ENTITY>.isSupported() would lead me to assume that ENTITY must be some kind of representation of a query operation rather than a representation of its (future) results. Same with execute(), where I think the general assumption is that we execute queries rather than results.

I hope I am not coming across as exceedingly pedantic; intuitive interfaces that map well to how we commonly use words are a big concern of mine.

rubensworks commented 2 years ago

I think this would actually make things worse. We'd end up using Query to both a) refer to a generic concept of a query operation, such as in QueryOperationCost, and b) refer to an actual "instance" of a query at the engine level, such as in the current QueryableContext which would become QueryContext.

But in this case, it is an actual query context, right? If we keep it QueryableContext, then it could be interpreted as a context that is queryable, which is false. The only thing that is queryable is a query engine.

That still sounds semantically wrong to me. Intuitively, seeing something like .isSupported() would lead me to assume that ENTITY must be some kind of representation of a query operation rather than a representation of its (future) results.

A future to a query result sounds semantically correct to me. Perhaps the naming of isSupported() is the problem then? In any case, the presence of an execute() method is quite standard in the future design pattern AFAIK.

Same with execute(), where I think the general assumption is that we execute queries rather than results.

Strictly following the Future design pattern, we wouldn't be executing the query, but we would be executing the future to the query result.

gsvarovsky commented 2 years ago

The noun "query" is used in English to represent both the content and the act. So I'm quite happy with Ruben's changes.

I certainly think we should avoid using QueryableX where X is not queryable. (So actually, QueryableSparql should be SparqlQueryable.)

jacoscaz commented 2 years ago

But in this case, it is an actual query context, right? If we keep it QueryableContext, then it could be interpreted as a context that is queryable, which is false. The only thing that is queryable is a query engine.

Ah yes, that is also correct. QueryableContext definitely sounds wrong and the Queryable prefix is misleading, I agree with both of you.

Perhaps the naming of isSupported() is the problem then? In any case, the presence of an execute() method is quite standard in the future design pattern AFAIK. Strictly following the Future design pattern, we wouldn't be executing the query, but we would be executing the future to the query result.

Aha, I think you raised exactly the specific issue I was dancing around without being able to name it. First of all, I should clarify that I am in favor of the spec as it is - which I really like - and I am only discussing the naming of interfaces. The current naming scheme reads to me just as @rubensworks wrote: we execute the future to the query result.

However, I think that for the vast majority of programmers it'd be much easier to have a spec that reads like we get the result of a query by executing the query, which would map to something like engine.query(): Query and query.execute(): Result. Most programmers have implicitly used the Future design pattern and may even use it every day, yes, but significantly fewer programmers have actually thought about it or even given it a name. Binding the naming scheme of the spec to a paradigm makes for a slightly higher barrier of entry for no clear benefit IMHO.

All this said, I'm not going to keep pestering you all on this. Happy to go with the consensus, as always.

rubensworks commented 2 years ago

However, I think that for the vast majority of programmers it'd be much easier to have a spec that reads like we get the result of a query by executing the query, which would map to something like engine.query(): Query and query.execute(): Result.

Won't the fact that we have two distinct things (the query string, and the query future) that are both named "query" be too confusing then?

Binding the naming scheme of the spec to a paradigm makes for a slightly higher barrier of entry for no clear benefit IMHO.

I actually like it when that happens 😄 Makes the code more universal, as it can be explained in terms of language-independent patterns that are known by many people, and are extensively documented. (Promise in JS is also named like that for this reason)

I'm not married to "Future", definitely open to alternatives, but AFAICS it's the best name we have at the moment.

jacoscaz commented 2 years ago

Won't the fact that we have two distinct things (the query string, and the query future) that are both named "query" be too confusing then?

Good point. If we ever switched to what I proposed, that argument would definitely need to be renamed to something like queryString.

I actually like it when that happens 😄 Makes the code more universal, as it can be explained in terms of language-independent patterns that are known by many people, and are extensively documented. (Promise in JS is also named like that for this reason)

Aha, it's nice to disagree every now and then. Disagreement is where learning happens. My preference goes for naming schemes that try to stay close to how I would state something while discussing at a coffee machine. I think that makes for code that is easier to understand, although it might be harder to write or model initially. I don't think it's a purely personal thing but I am also not sure that it isn't. I interpret the idea of universality you mention as code that is more effective at describing how it does something than at describing what it does.

EDIT: not worthy of another comment but relevant to the discussion is the fact that a naming scheme that matches the paradigm (i.e. that expresses both the what and the how) makes it harder to support both promises and callbacks while keeping the same interfaces. To me that's an instinctive red flag about violating some sort of orthogonality principle.

Coming back to earth, I'm ok with keeping the naming as-is if that's where we stand collectively. Thank you for the lovely exchange!

gsvarovsky commented 2 years ago

Won't the fact that we have two distinct things (the query string, and the query future) that are both named "query" be too confusing then?

No.

Per my previous comment, it's natural to me that "query" means both the query string/algebra, and the query future. The engine.xxx call is a reification of an abstract query to an executable one.

I don't like the latter being called "result", because it definitely isn't one.

Personally I prefer to keep the naming short, so while I don't have a problem with the explicit referencing of the "future" pattern, I don't think it's necessary.

rubensworks commented 2 years ago

I don't like the latter being called "result", because it definitely isn't one.

Indeed, it's a "future to a query result".

Personally I prefer to keep the naming short, so while I don't have a problem with the explicit referencing of the "future" pattern, I don't think it's necessary.

Me too, as long as it is sufficiently descriptive.

So the options seem to be:

  1. QueryResultFutureBindings
  2. QueryBindings
  3. QueryResultBindings (this is what we have atm)

My personal preference is option 1, even though it is longer. But I'm open to option 2 if the consensus lands there. But I'm definitely against option 3.

jacoscaz commented 2 years ago

All right, seems we have a consensus on option 2. QueryBindings!

jacoscaz commented 2 years ago

There'a long conversation happening within the comments to https://github.com/rdfjs/query-spec/pull/7#discussion_r760185767 that has led us to realize the following issue:

const query = await engine.query('QUERY STR', { queryFormat: { language: 'someUnsupportedLanguage' } });
await query.isSupported(); // false
query.type; // ?

Basically, engine.query() may resolve to a query which the engine cannot parse, thus making it impossible to determine its result type, thus making it impossible to produce the return value of engine.query() itself unless we added a special type to the list of allowed result types.

We opted to drop isSupported() and to make the promise returned by engine.query() reject in case of unsupported queries.

Personal note: as an engine might go through multiple attempts at iteratively simplifying queries until it gets to a query that is supported by the underlying store (particularly in the case of Filterable), perhaps it'd be worth resolving to null rather than rejecting with an error, the idea being that an unsupported query might not be such an exceptional use case but rather a common occurrence.

jacoscaz commented 2 years ago

@rubensworks @gsvarovsky there are no pending / unaddressed comments ATM! The only thing I have not addressed in the spec, although we have commented upon it but then forgotten about it, is where to put limit and offset or start and length. I think these ought to go both into

  1. an opts object for high-level methods (queryBindings(), ...), which could possibly be used to request a specific order known a-priori
  2. the QueryExecuteOptions interface

Objections?

Other than the above, I think we should merge this and then continue the conversation in dedicated issues.

rubensworks commented 2 years ago

where to put limit and offset or start and length. I think these ought to go both into

We already have them in FilterableSource is seems, which should be sufficient I think? The queryable interface should not have them I think, since offset/limit can be passed via SPARQL queries (or the algebra).

Other than the above, I think we should merge this and then continue the conversation in dedicated issues.

Sounds good!

jacoscaz commented 2 years ago

We already have them in FilterableSource is seems, which should be sufficient I think? The queryable interface should not have them I think, since offset/limit can be passed via SPARQL queries (or the algebra).

Haha, I am an idiot. +1