TypeDB 3.0: functions - Githubissues

cxdorn commented 2 months ago

Problem to Solve

We want to address several shortcomings in TypeQB at the same time:

Nesting of queries: call queries from other queries.
Using aggregates in queries: the callee query may compute an aggregate which can then be used by the caller query.
Control flow: let the user control the order and depth of nested query execution
Clarity through explicitness: replace the reasoner "blackbox" with a self-explanatory functional language
Groundwork for views and incremental view maintenance.

Current Workaround

Most of the above points are currently impossible to implement in TypeDB.

Proposed Solution

Introduce functions, which may be called from queries and from other functions. In the type-theoretic setting of TypeDB, functions are a natural counterpart to queries that can fulfill the above purposes.

How to define functions

Function keyword

Functions are introduced in define queries using the special syntax define fun, which is followed by the function declaration.

Function declaration

A function declaration comprises:

a function name, e.g. get_ancestors
a list of parameters in parentheses with optional types and default values, e.g. ($child, $depth : integer = 5). Default values can be given if a type is specified, and that type is either a value type or an attribute type. If no type is specified, it will be inferred from the function.
a return type which can be either
- be type of tuple streams, e.g. {person, name[]}
- be a type of single tuples, e.g. name[], int. In this case, we may include optionality with? (e.g.name, int?)
A match query (i.e. match <pattern>) with stream modifiers (e.g. limit, sort, filter, skip, take)
A return clause, which can be one of the three things:
- stream return via return {$x, $y}; this filters the stream of concept maps from the match clause to a depuplicated stream of tuples. For example, return {$p}; could be the return statement for a function with output type {person}.
- tuple return via return EXPR; where EXPR is a tuple of aggregation operations applied to the stream of concept maps (output by the query). For example, return count $x, sum $y, mean $z, could be the return statement for a function with output type int, double, double?.
  - As a special case,return check; which outputs a singletonbool tuple, reflecting whether or not stream of concept maps (output by the query) is non-empty.

Here is a full example of a function declaration.

define fun average_salary($company) -> double? :
  match
    $e (employee: $_, company: $company) isa employment;
    $e has base_salary $s;
  return reduce mean $s;

How to use functions

Calling functions in `match`

Functions can be called from within pattern of a match clause (note: since functions perform data retrievals, there is currently no plan to support functions in other clauses like insert or delete). Function calls pass their arguments in brackets (...) after the function name. By default, arguments are passed based on their position, but labels of variables the function's arguments can used to assign a parameter directly as well using = (e.g. if the signature is my_fun ($param1, $param2) -> output we may call my_fun(param2 = $x, param1 = $y)).

The precise way functions are used depends on their return type.

A function which returns schema-typed tuple stream (e.g. my_fun($param) -> {person,age}) will be called with a statement using the in keyword. For example, our match clause could contain the statement $p, $a in my_fun(param = $x);
A function which returns a value-typed tuple (e.g. my_fun($param) -> int, string) will be called with a statement using the assignment operator =. For example, our match clause could contain the statement $p, $a = my_fun(param = $x);. Note the this pattern may fail to hold if the return type of a function includes optional value type: a missing value for a variable is considered a non-satisfied pattern.
- Single value functions in expressions. Single value returning functions can be used in expressions. E.g. count_sth($param) > 0;.
A function which returns a boolean check (e.g. my_fun($param) -> bool) will be called just like a single value function. E.g., my_check($param) == true;.

For example, the avg_salary functions above could be used as follows:

match
  $msft isa company, has name "Microsoft";
  (employee: $e, company: $msft) isa employment_contract, 
     has base_salary > average_salary(company = $msft);
filter $e;

Calling functions in `fetch`

Functions can similarly be called in a fetch, as along as all arguments are bound:

match
  $x isa company;
fetch
  $c as company: name;
  average_salary : avg_sal($c);

The only functions allowed to be called are functions with printable return type:

Either a stream of printable types (i.e. attributes or value types, or lists thereof)
Or a tuple of printable types (i.e. attributes or value types, or lists thereof)

Note: the grammar is TBD. Potentially, the following could be allowed:

fetch
  $c as company: name, avg_sal($c);

or

fetch
  $c, $d as company_diff: diff($c, $d);

Calling functions in `insert`, `delete`, `put`

This will not be allowed. Instead, required variables may be assigned in preceding match clauses.

Calling functions recursively

Since in a match query we may call functions, and since functions include a match query, in functions, too, we can call functions. Importantly, functions can call themselves recursively, or can call/be called by other functions in a mutually recursive fashion. There will be guards on recursion for functions containing negations following the idea of stratifications.

Here is a simple example of a simple recursive function:

define fun ancestor($child, $depth : integer = 5) -> {person} :
  match
    $depth >= 0;
    ($child, $parent) isa parentship;
    $a in ancestor(child = $parent, depth = $depth - 1);
  return { $a };

Additional comments

Not addressed in the above are so far:

Discarding parts of the output of the functions, e.g. $x, _ = my_fun(param = $p); would only require $x to be assigned, and disregard the second argument.
Handling optional outputs directly, by providing default values in case (empty_ is returned by the function. An indirect way of handling this these cases could be achieved with or along the following lines:
```
match
... # pattern for $p;
{ $x, $y = my_fun(param = $p); } or { $x, _ = my_fun(param = $p); $y = 30; };
$z isa something, has age > $y, has name $x;
```

brettforbes commented 2 months ago

This is brilliant, but can it be extended so Fetch statements can also be bound to a single name and parametrised?

The aim is to collapse a Fetch statement into a single name, with parameters, so it can be used interactively!!!

REASON: Fields like cybersecurity have composite Entities with many optional sub-parts, for example consider the basic File object, with its 10 optional sub-objects.

The strength of TypeQL is that it can handle this intricacy with ease.

The weakness of TypeQL is that you can't easily handle all of these optional variations interactively. Class hierarchies can reduce query complexity in the match statement, but you need to know which optional variations your desired object contains upfront. Fetch can handle all of this, since it is quite clever, but the downside is the statement will be very long, as the variations are described sequentially.

Once query intricacy passes a certain point (e.g. 5-10 lines), it becomes impractical interactively for average users.

What we need is a means of treating the Fetch statement in the same ways as the function, so it can be bound to a single name, parametrised, and used within other statements, particularly if it could be case-specific. The ideal would be the use of the lower case for the top-level entity (e.g. file), and the title case for the composite object's pre-registered Fetch statement (e.g. File).

Then TypeQL could really become very powerful for interactive use, since currently it is too verbose to handle more intricate domains. Currently we have to use a higher level query language for our User Interfaces, and then down-transpile to TypeQL. It would be brilliant if we could surface TypeQL to users with composite objects, bound to single names.

lveillard commented 2 months ago

In case it helps, I packed some ideas around functions in the Functions section in my wishlist for 3.0 https://github.com/vaticle/typedb/issues/6945

vaticle / typedb

TypeDB 3.0: functions #7038

Problem to Solve

Current Workaround

Proposed Solution

How to define functions

Function keyword

Function declaration

How to use functions

Calling functions in `match`

Calling functions in `fetch`

Calling functions in `insert`, `delete`, `put`

Calling functions recursively

Additional comments

vaticle / typedb

TypeDB 3.0: functions #7038

Problem to Solve

Current Workaround

Proposed Solution

How to define functions

Function keyword

Function declaration

How to use functions

Calling functions in match

Calling functions in fetch

Calling functions in insert, delete, put

Calling functions recursively

Additional comments

Calling functions in `match`

Calling functions in `fetch`

Calling functions in `insert`, `delete`, `put`