Intersection types - Githubissues

nsbgn commented 2 years ago

Now that #74 has freed up the pipe operator |, it will now be possible to implement operator overloading with yet another mechanism, on top of subtype polymorphism and parametric polymorphism. That is, we could consider adding the following, with an intuitive reading:

f = Operator(type= A ** B | C ** D)

This is somewhat possible already using parametric polymorphism:

f = Operator(type=lambda x, y: {T(x, y)[T(A,B), T(C, D))} >> x ** y)

However, applying either an A or a C to this function will not result in, respectively, B and D, but rather in a variable y[B] or y[D]. This means that subsequent uses of the result will correctly fail where it should, but we obtain no fixed type to reason with. The above solution would address this.

In mild defiance of the YAGNI principle, I'm pre-emptively publishing this, as it's likely that this will pop up if the library is ever used by others, but it is not necessary for the CCT algebra and so I will not work on it right now.

nsbgn commented 2 years ago

The + notation might be preferable over | as its precedence is higher than >>'s.

Although this doesn't matter as >> only returns its right operand anyway, since it relies on side effects.

nsbgn commented 2 years ago

An example use case is for the apply operator: instead of having apply and apply2 you could just define apply = Operator(type=lambda x, y, z, c: (x ** y ** z) ** R(c, x) ** R(c, y) ** R(c, z) | (y ** z) ** R(c, y) ** R(c, z)).

nsbgn commented 2 years ago

This would also help in defining an unordered tuple type (see also #78 for the ordered variant). That is, a type that is a supertype of both A * B and B * A. This would be useful since we often won't care about the order in which multiple attributes of a relation will be presented - just that there are multiple.

There are probably a lot of problems with this (for one, the type taxonomy would be cyclic).

The + operator might be natural for this.

nsbgn commented 2 years ago

I gave it some more thought.

What should probably happen is the following:

We should have an Intersection type. We should have a Top and Bottom type; top representing a value of any type (thus a supertype of all types) and bottom representing a type which has no values (thus a subtype of all types). Then, in the CCT algebra, R(A * B, C & D) would make sense: it is the type of relations for which the key attributes have, respectively, type A and B, and for which the dependent attributes have at least types C and D. This is a subtype of R(A * B, C) and R(A * B, D), and also of R(A * B, Bottom). It is also a supertype of R(A * B, C & D & E). This is exactly the behaviour we want: we often don't care about the types of other independent attributes, so we don't want to define the relationships so rigidly using Product types.

Unfortunately, I think this will complicate the type inference a lot, and I need to give it some thought. ...And read papers, probably.

For consistency, Tuple types should be renamed Product types.

nsbgn commented 2 years ago

Top and bottom types should get a separate issue (EDIT: #94). We need them more urgently because it will help us in using types of the form F(_) in queries (since we can just search for F(Bottom) instead).

nsbgn commented 2 years ago

This is now leading to an issue: the question parser often returns R(Obj, x) where the manually constructed queries ("gold standard") would use R(Obj, Reg * x).

As indicated in https://github.com/quangis/transformation-algebra/issues/79#issuecomment-1106773887 I think the correct solution is to have intersection types in this library. However, as a temporary solution, in queries, we can convert R(Obj, Reg * x) to the union of R(Obj, Reg * x) and R(Obj, x). For this to work, however, we need to implement choice queries --- see issue #77.

nsbgn commented 1 year ago

The product type A * B should perhaps be a supertype of the intersection type A & B.

quangis / transforge

Intersection types #79