josefs / Gradualizer

A Gradual type system for Erlang
MIT License
613 stars 35 forks source link

Function guards as type constrains #203

Open NelsonVides opened 5 years ago

NelsonVides commented 5 years ago

How comes that this doesn't pass?

-type name() :: atom() | string().

-spec with_guards(name()) -> list().
with_guards(Name) when is_atom(Name) ->
    "pass_" ++ erlang:atom_to_list(Name);
with_guards(Name) when is_list(Name) ->
    Name.

I guess it's obvious what I mean. The first branch will never be taken if Name is anything else than an atom, so the call to atom_to_list is perfectly safe. The same for the second branch, what was given is for sure a list, so the return type is for sure matching the expected one.

My investigations trying to understand this problem took me to the following call path: type_check_forms/2 -> type_check_function/2 -> check_clauses_fun/3 (fun_ty branch) -> check_clauses/4 -> check_clause/4, ending with a throw on https://github.com/josefs/Gradualizer/blob/c31c2326ad6b9674cefdb5c202f036bd86a150ce/src/typechecker.erl#L3219

There, for the first branch for example, I would have expected Name to already be of type atom(), and hence pass successfully, but exploring the env, I can see that Name is still a union of atom and list, hence failing.

Now, I suppose that it is on https://github.com/josefs/Gradualizer/blob/c31c2326ad6b9674cefdb5c202f036bd86a150ce/src/typechecker.erl#L3217-L3218 where this narrowing would be done. check_guards would give a new β€”narrowerβ€” variable binding, which would then be added in add_var_binds to the new environment passed to type_check_block_in. But it is now, and I feel like I went really far with so much knowledge missing on my side. Can I ask you for help on how to continue? How do these two functions work (check_guards and add_var_binds) and what are all the TODO's inside of them? What's the difference between all those type_check_ functions suffixed with and without the _in?

PS: also, I've been exploring all this stuff by just putting TONS of ?verbose everywhere and compiling-executing again and again. Is there some better trick or that's the only way? :stuck_out_tongue:

zuiderkwast commented 5 years ago

This is what I want to solve next, whenever I get enough time. Also other very simple guards such as Var == atom1; Var == atom2.

My idea was not to use the constraints which are partially implemented, because I don't really know how constraint solving is supposed to work, so instead I wanted to just return map of Variable => Type for the guard and combine those with the variables bound in the corresponding pattern matching. But maybe actually fixing the constraints solving would be better? (@josefs?)

For exploring the source code, I guess putting printouts in there is a good way. Tracing is another possibility, but I'm not sure it's easier. Also, just reading source code is a good way, but this source code is getting a bit big now and some function names are not very obvious...

zuiderkwast commented 5 years ago

Feel free to implement this btw! You seem to be on the right track, at check_guards.

NelsonVides commented 5 years ago

@zuiderkwast reading the source code is getting increasingly complicated, specially for this typechecker.erl module, at >4k LOC, it's quite a cognitive overhead. I think that module should get some simple refactoring, in the style of dividing responsibilities into helper modules. No need to rewrite any function, just, putting all the functions and their helpers responsible for one thing in one module, and exporting only what is supposed to be the main function. Would be easier to see what each piece of code is doing, and which functions are supposed to be just helpers. But I'm afraid that would take me ages without a bit of guidance, I'm not familiar with this mastodontic module :sweat_smile:

About implementing this, I'm trying to pretty much what you said, making check_guards return the map of Variable => Type with the bindings I want, which are then added to the environment. So I'm pretty much rewriting check_guards to take into consideration these cases.

I'm also totally ignorant of how constraints work. Maybe an explanation on this matter would help.

josefs commented 5 years ago

I don't think constraints are going to help here. The plan you guys have to improve guards sounds good.

As for the complexity of the typechecker.erl, I'm not opposed to refactoring it. But personally I don't find splitting things up into different modules that helpful unless there is a clear abstraction that can be broken out. The key to navigating this kind of code base is to have good editor support for jumping between definitions, no matter if they are in the same module or not.

zuiderkwast commented 5 years ago

OK. One thing I think would be useful is to make add_types_pat return the variable binds rather than returning an updated VEnv, i.e. put the merging of varbinds outside. Then, we can do the guard checking for the varbinds from the corresponding pattern matching.

As for refactoring, I think a well documented interface is good, with good function names and parameters. If flatten_types would be an unexported helper function in some module, it would help, but also just adding a comment saying that it's a helper for normalize/2 would be helpful. Some renaming of functions would be useful too, e.g. all those type_check_in and add_types_pat are not very self-explanatory.

josefs commented 5 years ago

One thing I think would be useful is to make add_types_pat return the variable binds rather than returning an updated VEnv, i.e. put the merging of varbinds outside. Then, we can do the guard checking for the varbinds from the corresponding pattern matching.

I like this idea!

If flatten_types would be an unexported helper function in some module, it would help, but also just adding a comment saying that it's a helper for normalize/2 would be helpful. Some renaming of functions would be useful too, e.g. all those type_check_in and add_types_pat are not very self-explanatory.

Agreed

NelsonVides commented 5 years ago

I have a question, in line with the last things said last time I posted here. I have some prototype for a better check_guards that refines the types from the guards, but, when it then goes into add_var_binds/3 on the next line, it's all scrapped because these variables are often already assigned with type any. What is the best approach to indeed rewrite the value of this new binding inside of the environment? I can just do it the hard way with maps, but I don't know if there's some function, across the hundreds of functions inside typechecker, that already exposes the API I'm looking for.

And in response to the things you've said. @josefs a editor with good jumps is not enough for such a size of a module, and @zuiderkwast putting comments on all functions is still not enough for somebody who just arrived to this codebase (i.e., me). I'm now looking for a function that seems too useful not to have been implemented already, and for that, jumps or comments won't help me find anything, at the very best, they only help understand where I am, but not what is there outside my current focus. I'm a strong proponent of the single responsibility principle, it helps a lot to understand things and to arrive to places.

About add_types_pat, I don't really follow what you're discussing, can you give me a bit of background? I'd like to understand, and help :slightly_smiling_face:

PS: I'm experimenting my implementation of this in this branch on my fork, in case you're curious. Very dirty code, if it ever works I promise I'll delete all the printouts :stuck_out_tongue: https://github.com/NelsonVides/Gradualizer/tree/fun_guards

zuiderkwast commented 5 years ago

I'm all for lifting out stuff to different modules. I guess we can not keep the records in a single module, so we'll have to move them to include files. That's fine for me. Then we can have a module for the functions such as normalize, glb, subtype, etc. and keep the core AST traversal functions where they are. OK?

About add_types_pat, I don't really follow what you're discussing, can you give me a bit of background? I'd like to understand, and help πŸ™‚

What I was referring to is union_var_binds, which is merging two (or a list of) maps of variable bindings into one. Rather than doing it inside add_types_pat, in can be done in check_clause, etc. Then you can know which var binds are new for the clause and which are from before.

union_var_binds combines the variable bindings, but if a variable is bound in two sets of varbinds (i.e. two maps), the type is actually merged using greatest lower bound (glb), i.e. intersection (so maybe the function name isn't all that intuitive). I guess in this case, where the pattern binds one type (e.g. X :: term()) and the guard binds another type (X :: integer(), e.g. from is_integer(X)), the resulting type should be integer(). If an unbound variable gets bound to any() though, we have a problem, because any() is the unknown type which isn't good here. Probably we can just instead bind it to term() in add_types_pat. Hopefully, it can just be changed without problems...

NelsonVides commented 5 years ago

Indeed, this is giving me trouble now. When some variable was given the type any(), then whatever refinement I do when checking the guards is ignored when doing union_var_binds or add_var_binds, and I'm not sure now how to fix that... πŸ€” My only idea is to make a different function, very similar to add_var_binds, but that on the function given to merge_with, I put another couple of function heads for the cases where one of the two variables is any.

Have a look at this: https://github.com/josefs/Gradualizer/compare/master...NelsonVides:fun_guards I'm thinking about making a PR for that soon πŸ™‚

PS: "greatest lower bound" is a fantastic name for me. "glb" is yet another cryptic acronym I had to check several times what was the meaning of it, but it's fine, I just suck at acronyms πŸ‘

NelsonVides commented 5 years ago

I have another question for a function I cannot find. union_var_binds does a glb over all the variables given, but what if I want a lup (least upper bound, i.e., most likely a union)? Do we have some helper function defined somewhere I can use for that? I fear not and it's going to be quite a big deal to implement πŸ˜…

zuiderkwast commented 5 years ago

You're thinking is correct, LUP is a union. :-)

LUP = normalize({type, erl_anno:new(0), union, [T1, T2]}, TEnv).

Similarly, GLB is intersection.

As for any(), we don't do much about it, which is kind of the point of gradual typing. Any is the dynamic type and for this type, we don't want to detect any type errors. In a test case where the expected type is something "known", union_var_binds should work, e.g. a union of integer() and atom():

-spec f(integer() | atom()) -> integer().
f(X) ->
    case X of
        N when is_integer(N) -> N; % Varbind in the clause
        A when is_atom(A)    -> 42 % Exhaustiveness
    end.

Note also that literals have the type any(), to make sure there are no type errors for code without specs. (This can be changed using the --infer or {infer, true} option.)

josefs commented 5 years ago

@zuiderkwast I assume you mean "LUB" rather than "LUP" :-)

I just want to highlight one thing that is important to consider when refactoring add_types_pat. In a pattern in a case expression we might use a variable that it already bound and it's important to get both the scoping and the types right in this case. I'm sure it's doable but it requires some care to get it right.

NelsonVides commented 5 years ago

@zuiderkwast when it comes to

This is what I want to solve next, whenever I get enough time. Also other very simple guards such as Var == atom1; Var == atom2.

This would only work if the --infer flag is given, as only then, Gradualizer will decide that atom2 has type {type,_,atom2}, and then we can do the refinement of the type of Var. Without the --infer flag, I have no idea how much complex it could get, but for the flag enabled I have working code for it.

So my question is, how to test it? How can I tell gradualizer to run this test with the infer enabled and check it passes and with the flag disabled and check it fails?

PS: normalize vs glb, that was damn eye-opening, THANK YOU! 😍

NelsonVides commented 5 years ago

@josefs can you explain me, or link me to some good read, what are the constrains solving, and how is gradualizer planning to solve this? I'm trying to digest this TODO: https://github.com/josefs/Gradualizer/blob/c31c2326ad6b9674cefdb5c202f036bd86a150ce/src/typechecker.erl#L1709-L1717

zuiderkwast commented 5 years ago

This would only work if the --infer flag is given, as only then, Gradualizer will decide that atom2 has type {type,_,atom2}

Oh, you're right, unfortunately. :-/ But if we don't infer it but rather have an explicit case for <var> == <atom literal>, it can work I think.

So my question is, how to test it? How can I tell gradualizer to run this test with the infer enabled and check it passes and with the flag disabled and check it fails?

We don't have a good way to do this. There are some test cases for the infer option in some test suite, but not the once in should_pass and should_fail. We have been talking about accepting options in attributes in each module, e.g. -gradualizer(infer). or -gradualizer([{infer, true}])., similar to how compiler options can be given. Maybe it's worth implementing it now to be able to test this in a nice way.

zuiderkwast commented 5 years ago

Again:

This would only work if the --infer flag is given, as only then, Gradualizer will decide that atom2 has type {type,_,atom2}

How about this trick: Set infer to true in Env locally, just during refinement of a known type.

NelsonVides commented 5 years ago

Again:

This would only work if the --infer flag is given, as only then, Gradualizer will decide that atom2 has type {type,_,atom2}

How about this trick: Set infer to true in Env locally, just during refinement of a known type.

I was thinking of precisely doing that, I think that I'll do that and make the next PR or push it together with my current one on guards refinement. I think that's a case where we're allowed to infer types, as guards are precisely a place where the runtime refines its information and does the dynamic typechecking.

In any case, pity not to have a ready way of testing with and without that flag. I kinda like the -gradualizer({infer, true}]), I could check out that for my next case in the near future.

josefs commented 5 years ago

@josefs can you explain me, or link me to some good read, what are the constrains solving, and how is gradualizer planning to solve this?

How much prior experience do you have with type systems research? I'm afraid the only things I know that I can point you to are fairly technical. This paper is a nice read, though the details of it doesn't match exactly what I have in mind for gradualizer it should give you some kind of idea: http://gallium.inria.fr/~fpottier/publis/fpottier-njc-2000.pdf

NelsonVides commented 5 years ago

Right. My experience is the wonderful book "Types and programming languages" by B. Pierce, years of template metaprogramming in C++, half of the programming language course on Agda by Phillip Wadler, and plenty of hobby coding on Haskell. But those three last ones had to be abandoned when I got a job in Erlang :stuck_out_tongue: In short, sufficient not to be scared of anything, but not enough to actually know stuff fluently. I'm studying your recommendation now.

PD: I now realise we really need some way of testing infers in gradualizer :cry:

josefs commented 5 years ago

| I'm studying your recommendation now.

Cool!

| PD: I now realise we really need some way of testing infers in gradualizer

We have that now πŸ™‚. See e.g. test/should_fail/infer_enabled.erl

tsloughter commented 2 years ago

I think this is the same as this I got today. For the code:

-spec to_string(atom() | binary() | list()) -> binary().
to_string(K) when is_atom(K) ->
    atom_to_binary(K, utf8);
to_string(K) when is_list(K) ->
    list_to_binary(K);
to_string(K) ->
    K.

Given the failure:

otel_resource_app_env.beam: The variable on line 76 at column 5 is expected to have type binary() but it has type atom() | <<_:_*8>> | [any()]

to_string(K) when is_atom(K) -> atom_to_binary(K, utf8);
to_string(K) when is_list(K) -> list_to_binary(K);
to_string(K) -> K.
                ^
zuiderkwast commented 2 years ago

It seems so. There are two things Gradualizer needs to understand here:

  1. K has type atom() inside body of the first clause and list() inside the second clause. I think this is fixed since you didn't get an error on atom_to_binary(K, utf8) and list_to_binary(K).
  2. K can't be an atom or a list in the 3rd clause, because those types are exhausted by the previous clauses.

    This is trickier, especially it doesn't work if the guards are more complex, but if each clause has a single type guard like in this example, I think we should be able to handle it.

    It'd be harder (let's not even try) in an example like this though:


-spec foo(integer() | atom()) -> atom().
foo(N) when is_integer(N), N > 1 orelse N*N*N =< 0 ->
    ok;
foo(A) ->
    A.
erszcz commented 2 years ago

Yes, I think with simple guards like these it's possible to handle it. TBH I'm surprised it's not handled already, since we definitely have code for type refinement stepping through clauses. Maybe the devil's in the details, but so far it seems doable.