wmde / Ask

DISCONTINUED: Library containing a PHP implementation of the Ask query language.
Other
11 stars 2 forks source link

QueryContext #6

Open mwjames opened 10 years ago

mwjames commented 10 years ago

The ask query and its components are means to express a natural language construct (or better a question) like "Show me all cars that have green tires" but I would argue that the current ask query implementation is incomplete in a sense that it doesn't allow to specify a context a query (or the question) is embedded.

For example, the same query executed with a different context may or may not lead to a different result set.

A context can be a relevance factor, a specific environmental variable, authorization etc. and may be specific to the QueryEngine that tries to answer that query. Being able to inject a QueryContext would help to express such environmental factors.

[0] http://www.researchgate.net/publication/228697315_rdf_and_Contexts_Use_of_sparql_and_Named_Graphs_to_Achieve_Contextualization/links/0fcfd50be3562b1cee000000

[1] http://www.csee.umbc.edu/courses/graduate/691/spring14/01/examples/sesame/openrdf-sesame-2.6.10/docs/users/ch09.html#section-context

JeroenDeDauw commented 9 years ago

I just saw this now - thanks for bringing this up.

@mkroetzsch thoughts?

It probably makes sense to ask the same question on wikidata-tech for the query support WMF plans to provide. Assuming this has not already been considered. Ping @JanZerebecki

Thinking about how to deal with such contexts in implementation... Forcing answering mechanisms to handle all possible context is not nice, since most of it will likely simply not be relevant. If all context is dependent on the environment and not specified in the query, then it can be passed as config to the relevant answering mechanism implementation without changing the general interface or query language.

mkroetzsch commented 9 years ago

The word "context" does not have an agreed upon technical or conceptual meaning in databases or semantic technologies. Many people have done many completely unrelated things, and called them "context". Whole research/technology areas, such as temporal databases, versioned databases, RDF quad stores, etc., can all be said to implement some form of "context". Interestingly, no single system implements all of these approaches, since they require very different technologies and serve different uses. Therefore, I don't think that this proposal is clear enough to be a starting point for much discussion.

In Wikidata, context is modelled by qualifiers, and they are supported by the query engine already. Another natural form of context would be the revision history ("find all statements about European cities entered last week by user GerardM"). This form of context we do not want to support in query answering in Wikidata (too expensive). Each kind of "context" needs its own technical approach and has to be discussed individually. Even if we were to support both kinds of context, we would probably not consider using the same format for adding context requirements to queries for these different cases (EDIT: I weakened this statement; maybe it could be useful to have a uniform representation based on qualifiers here too).

Side remark (@mwjames): ask was inspired by the class structure of OWL classes, which is based on description logics, which were (back in the 80s) conceived as a concept language for AI applications and which had in part been motivated by the ability to express English noun phrases. Therefore, one could argue that ask has some elements of natural language modelling in it, but this was certainly not the main motive/goal for its design. The main motivation for picking this query language was the well-known result from databases that tree-shaped queries can be answered more efficiently than arbitrary conjunctive queries, though this is ignored by MySQL as it later turned out ;-)

mwjames commented 9 years ago

Interestingly, no single system implements all of these approaches, since they require very different technologies and serve different uses. Therefore, I don't think that this proposal is clear enough to be a starting point for much discussion.

context as the name suggests describe a specific environmental concept not bound to a particular implementation hence the proposal is arbitrary in a sense that each implementation (Wikidata or SMW) have a means to define such factors and to be used during query answering. For example, limits and offsets ("find all" which may yield results in an infinite time is actually an infinite context depending on when such query is executed, how much data and constraints are to evaluate etc.) while generally recognized as selection parameter are in a broad sense a contextual factor that are based on the intention of the requesting party (user or api) to reduce/broaden a probable answering result.

In Wikidata, context is modelled by qualifiers, and they are supported by the query engine already. Another natural form of context would be the revision history ("find all statements about European cities entered last week by user GerardM"). This form of context we do not want to support in query answering in Wikidata (too expensive).

The question was directed towards a have technical means to define such a context (which can vary by the implementation that chooses to use such concept) that would allow an interface describe factors specific to a QueryEngine implementation. Ask tries to describe a query with a help of natural language therefore contextual assumptions [1] should be at least possible in Ask whether it is being used by Wikidata or not is secondary to the question of providing an interface (in technical terms).

[1] http://ihd.berkeley.edu/Erv-Tripp%20Pragmatics/Context.pdf

mkroetzsch commented 9 years ago

Technically, one could of course collect query settings like "limit" and "offset" in a context object. However, this is not what is called "context" in natural language. I am still wondering how the notion of context in natural language should be defined in a PHP object, and how a query engine should handle such an informal notion. On the technical side, I wonder what the advantage would be of having an extra object for limit and offset.

Another thing to note is that the "context" represented by quantifier information in Wikidata is not on the level of the query as a whole, but must be specified for each query condition. I think this is the case for many advanced notions of context: you would want to set them individually for each part of the query, not just once for all of the query.

mwjames commented 9 years ago

On the technical side, I wonder what the advantage would be of having an extra object for limit and offset.

limit/offset was only used to demonstrate the nature of a contextual assumption (which in layman terms are just selection parameters), I'm not proposing to put them into a separate object and to remain as an option (as a specific codified context).

Another thing to note is that the "context" represented by quantifier information in Wikidata is not on the level of the query as a whole, but must be specified for each query condition.

I haven't looked at the "quantifier information in Wikidata" and how it translates into query condition therefore I can't comment on this.

I think this is the case for many advanced notions of context: you would want to set them individually for each part of the query, not just once for all of the query.

If a query condition is understood as a compound of individual descriptions then yes (where a description represents the smallest unit of a condition apart from conjunctions/disjunctions) but the query as a whole (with all its descriptions) is prone to a context (see above).

mkroetzsch commented 9 years ago

limit/offset was only used to demonstrate the nature of a contextual assumption

Ok, got it. Then what would be an actual example of the kind of data that one would store in such a context object?

P.S. I will fly from Chile to Germany in a few hours, and not be able to reply to anything for 24-32h.

mwjames commented 9 years ago

Then what would be an actual example of the kind of data that one would store in such a context object?

One use case I came across was to know where a ask query was embedded (for example the subject that includes the query or a special page). If the ask query is page-included it might be that the subject (page) dictates certain conditions those that belong to its surroundings (hence the context) of how and where the query can be executed or display its results.

QueryContext itself is only an interface where the implementation is left to the user of the Ask library and implementer of the QueryEngine to decide whether the context object can be subsequently used by the QueryEngine / QueryResultPrinter or not.

/**
 * Shell interface that defines a context object where implemenation details are
 * left to the user of the Ask library
 */
interface QueryContext {}

class Query {

    private $queryContext = null

    public function getContext() {
        return $this->queryContext;
    }

    public function setContext( QueryContext $queryContext ) {
        $this->queryContext = $queryContext;
    }

}

For example, whether to fetch data from cache or run a new query execution, or tracking queries across usage needs information to "where it belongs".

Such reference information do not directly related to a query condition but to a query context. From a technical perspective I just want to ensure that Ask can entertain such use cases without much restrictions as to what a context object has to look like.

JanZerebecki commented 9 years ago

AFAIK for the Wikidata query stuff nobody yet thought about making a query more specific in an automated way based on some context (like how or where the user specified the query). Like asking for restaurants gives you those nearest to you if the context knows your current location. The assumption was that you would need to specifically write that into the query.

In Wikidata the qualifiers (probably what was meant with quantifier above) of a statement sort of represent the context of that statement that is added to the context that can be inferred from the Entity it is on. E.g. Gorge Washington: position: President of the USA: start time: 30 April 1789, end time: 4 March 1797. Where start time and end time are qualifiers. So the statement is only true during a particular time period.

As far as I understood this issue the context mentioned here with a query could already be rewritten as a more specific Ask query. (More precise: one specific Ask query for each possible combinations of parts of the context, which would include the empty context, as for a normal question asked by human it is not specified which context of them applies to the question.) E.g. Context: Current time: 2014-05-23T23:23, Query: Who is currently the president of the USA. Could be rewritten as the Query: Who is the president of the USA qualified with start and end time having 2014-05-23T23:23 between them.

If I understood this issue correctly a query context is probably necessary to answer questions in a more common sense way than in a strictly, literal, logical, computer like sense. But that also means there are huge amounts of data possible in a context and quite involved code to apply context to a query. And the result would be many possible query interpretations. Maybe that is too ambitious for us (as in Cyc) and something that would pass a context would better be implemented by being responsible to modify the query into one query that is more specific?