josefs / Gradualizer

A Gradual type system for Erlang
MIT License
613 stars 35 forks source link

Disjunctive normal form attempt #558

Open erszcz opened 1 year ago

erszcz commented 1 year ago

I'm creating this draft PR with an intention of closing it immediately afterwards. It's not a working solution, I just want to describe the problem and show an experiment aimed at solving the problem.

In https://github.com/josefs/Gradualizer/pull/524#discussion_r1220414681 I said I'd experiment with disjunctive normal form to address the following problem from Gradualizer's own code. The problem is quite prevalent - it occurs everywhere where we have a structure with an inner union. For example, pretty much every time we handle type() instances:

 compat_ty({type, _, tuple, any}, {type, _, tuple, _Args}, Seen, _Env) -> 
     ret(Seen); 
 compat_ty({type, _, tuple, _Args}, {type, _, tuple, any}, Seen, _Env) -> 
     ret(Seen); 
 compat_ty({type, _, tuple, Args1}, {type, _, tuple, Args2}, Seen, Env) -> 
     %% We can assert because we match out `any' in previous clauses. 
     %% TODO: it would be perfect if Gradualizer could refine this type automatically in such a case 
     compat_tys(?assert_type(Args1, [type()]), 
                ?assert_type(Args2, [type()]), Seen, Env); 

In the above example we see that even though there's a dedicated clause matching the case of Args1 = any (and symmetrically Args2 = any), we still have to assert that Args1 :: [type()] (respectively Args2 :: [type()]) in the clause which handles exactly that - Args1 and Args2 already refined to [type()]. We have to use a superfluous assertion, even though it's obvious to us and should be obvious to the typechecker.

I think it's caused by the following:

> typechecker:subtype(gradualizer:type("{t, a} | {t, b}"), gradualizer:type("{t, a | b}"), gradualizer:env()).
{true,{constraints,#{},#{},#{}}}
> typechecker:subtype(gradualizer:type("{t, a | b}"), gradualizer:type("{t, a} | {t, b}"), gradualizer:env()).
false

In other words, while Gradualizer understands that {t, a} | {t, b} :: {t, a | b}, it does not understand that {t, a | b} :: {t, a} | {t, b}. It seems to me that this property holding in both directions is crucial for our needs.

As you can see in notes.1.md in this PR, I contrasted it with the Elixir prototype based on the CDuce set-theoretic type checker. For Elixir/CDuce, the property holds in both directions.

To some extent it can be achieved by "expanding" unions as done in this PR - and it fixes some particular cases, especially the new tests in typechecker_tests - however, the general effect on tests is disastrous :(

erszcz commented 1 year ago

Hmm, now that I look at the results again, they don't look THAT bad:

Failed: 20. Skipped: 0. Passed: 708.

Maybe it's not such a bad idea after all... I remembered the result from June as much worse 🤔

zuiderkwast commented 1 year ago

Nice! I believe this is a good approach, especially since CDuce is the language where these theories by Castagna et.al. are tested. The two papers mentioned below are listed on the web site of CDuce. (I read them some years ago.)

In Gradual Typing with Union and Intersection Types (2017) they talk about DNF and Gradual-DNF and some other concepts and steps how to convert types into this form, compute subtyping, etc.

In Covariance and Contravariance: a fresh look at an old issue (a primer in advanced type systems for learning functional programmers) (2018), there's an efficient datastructure "binary decision diagram with lazy unions" to represent types in DNF and an algorithm how to transform types.

erszcz commented 1 year ago

Indeed, especially "Covariance and Contravariance" is very informative. It contains one more brilliant idea which I tested in Gradualizer, but with quite a dramatic impact on test results. The idea is defining subtyping by type difference.

Currently, in Gradualizer type_diff and subtype are defined separately. The paper proposes to define subtyping like this:

subtype(T1, T2) -> is_empty(type_diff(T1, T2)).

While this PR's approach to DNF doesn't seem to be as dramatic, I think it's not complete either. Together with subtyping redefinition, we'd be getting very close to the theory described there. The question, though, is whether it's worth to change fundamental parts of Gradualizer in the light of etylizer being developed from scratch according to that theory? As we speak it still has small coverage of the Erlang language features, but it seems to have more sound and better understood theoretical underpinning 🤔

zuiderkwast commented 1 year ago

Etilyzer is not gradual typing, is it?

Anyway, I don't have time and energy to do such major work in Gradualizer at this time.

erszcz commented 1 year ago

Etilyzer is not gradual typing, is it?

At the very least they accept the dynamic type on the syntax level, so I imagine they would like to expand in that direction.

time and energy

I'm afraid that's the main problem. There's a chance for some EEF (Erlang Ecosystem Foundation) funding (I spoke to one of the board members about it, nothing certain, but still), but apart from a budget, we would also need people with capacity. cc @xxdavid @josefs

xxdavid commented 1 year ago

Good idea! Thanks for experimenting with it. :)

20 failed tests don’t sound like that much. It seems to me that most of them are related to maps and glb, so maybe it would be sufficient to patch these places. But it might also require some deep refactoring, who knows…

Also, have you thought about the exponential explosion? Do you think it’s safe to use this expansion on type expressions of “practical size”?

Regarding the redefinition of subtype, what do you think would be the advantages (and possibly disadvantages) of it compared to the current implementation?

Funding for Gradualizer sounds really cool! I think it could help the project a lot. Not having anyone who would work on it regularly and as part of their job is one of the things that distinguishes Gradualizer from eqWAlizer and etylizer and probably one of the reasons why it doesn’t move very fast (compared to the other projects). Sadly, I don’t have much time for it either. I’m currently immersed in trying to improve the polymorphism support, and do not have much capacity left.

erszcz commented 1 year ago

20 failed tests don’t sound like that much.

Indeed, I might give it a go and try fixing them when I find a while.

Also, have you thought about the exponential explosion? Do you think it’s safe to use this expansion on type expressions of “practical size”?

I don't know :| I think it can be managed, given CDuce, etylizer, and upcoming Elixir systems all have to use this approach (because they follow the paper suggesting to use DNF).

Regarding the redefinition of subtype, what do you think would be the advantages (and possibly disadvantages) of it compared to the current implementation?

Currently we have two somewhat redundant and not completely compatible relations comparing types in Gradualizer: subtype and type_diff. refinable succinctly defines for which types type_diff is defined. This has led to some confusion a few times already, e.g. here - https://github.com/josefs/Gradualizer/pull/346#discussion_r719151072. To sum up, type_diff is not total on types (type pairs).

So type_diff "compares" some types to tell if one can be refined with the other and to tell what the result is. subtype compares types to tell if one is a subtype of another. This is quite a similar notion. It seems to me that disjoint is another way of saying "not a subtype or supertype". If one type is a subtype or supertype of another, then some refinement in that pair must be possible, either to a narrower type or, ultimately, to none()) - expressing subtype with type_diff seems natural in this light. However, subtype has to be a total relation - otherwise we couldn't use it on arbitrary types, but our type_diff isn't.

Practically, unifying them would mean we could have just one function implementing the not so trivial logic. I'm not sure if it would give any other significant benefits (like expressiveness), but it would help clarify the concept which currently seems to be a bit hazy in Gradualizer.

Funding for Gradualizer sounds really cool! [...]

Sadly, I don’t have much time for it either.

I think if the funding happened, it would be a fixed budget / one time grant, not a long-term cooperation / employment opportunity. However, it seems none of us, the latest contributors, can invest the time or considers it compelling :|