Treat boolean() as refinable

xxdavid commented 1 year ago

This clause was probably forgotten.

Because of this, the following function yielded no error:

-spec my_not(boolean()) -> boolean().
my_not(true) -> false.

Now it warns about non-exhaustive patterns.

xxdavid commented 1 year ago

Oh, I forgot to run tests. Good that we have the CI. 😄 But I think that the problem is hidden somewhere else, and this commit just revealed it. I've already seen some weird behaviour when typechecking maps in the past.

erszcz commented 1 year ago

Good catch 👍 But there's a bigger problem here.

I think that the problem is hidden somewhere else, and this commit just revealed it.

Indeed. For as long as some component type of a bigger type is not refinable, the bigger type is not refinable either, so exhaustiveness checking doesn't kick in - there's an explicit check disabling it. Now that all field types of t() are refinable:

-type t() :: #{apa := integer(), bepa := boolean()}.

-spec f(t()) -> boolean().
f(#{bepa := Bepa}) ->
    Bepa.

f/1 is checked for exhaustiveness and we get the following error:

  output:<<"test/should_pass/map_pattern.erl: Nonexhaustive patterns on line 13 at column 1
Example values which are not covered:
    #{apa := 0, bepa := false}
">>

In order to solve it properly, we would have to assume that there's a catch-all clause for the apa field of t() which we don't care about. The catch-all would lead to refining apa := integer() to apa := none(). However, the question is what a reasonable heuristic to tell that we're not interested in apa would be.

We could try with checking if any clause mentions apa at all - if none does, assuming a catch-all is likely fine. However, if one clause does, but others don't, is it a programmer's error or a deliberate decision? An explicit catch-all match on apa value would resolve the issue, but it's only reasonable to expect with a map of two fields. What if there are 10 or 20 fields in the map?

Another option is disabling exhaustiveness checks on non-union types to silence false-positives like this one. The idea of exhaustiveness checking is most useful for working with algebraic data types. In Erlang there's no such first-class construct, but we simulate them with tagged tuples. So maybe exhaustiveness checking only makes sense for unions of tuples whose first element is an atom?

erszcz commented 1 year ago

For comparison

-spec key_subtype(#{atom() => integer()}) -> integer().
key_subtype(#{banana := N}) ->
    N.

does not raise an error since atom() is not refinable either. A finite union of atoms is perfectly refinable, but an infinite atom() type is not refinable otherwise than directly to none() by a catch-all clause. In general, refinable() is a filter for things which we know how to handle reasonably.

xxdavid commented 1 year ago

Hi @erszcz, thanks for your detailed reply (as always). 🙂

However, I still don't understand why cannot we act like there was a catch-all association for the apa field (eg. apa := _Apa) in the #{bepa := Bepa} pattern. I believe that the semantics of patterns #{a := A} and #{a := A, b := _} is the same if the type of the value that is matched on is #{a := some_type(), b := some_other_type()}.

I also don't understand how matching on different fields (of the same map) in different clauses (of the same function) would cause any trouble. For example, in a gen_server, I think it is quite a common pattern to model the state as a map (at least in Elixir where maps are more commonly used) and then in each handle_call clause match only those fields that are really needed for that clause.

And why isn't it reasonable to add catch-all associations for maps with 10 or 20 keys?

erszcz commented 1 year ago

I also don't understand how matching on different fields (of the same map) in different clauses (of the same function) would cause any trouble.

Because we wouldn't know if the non-matching clauses do not match by intention or by mistake. In other words, we wouldn't know whether to throw the nonexhaustive error or not.

Maybe I didn't manage to make my reasoning clear. I'll try to come up with an example or a better description tomorrow.

erszcz commented 1 year ago

Hmmm, I've sketched a longer answer using the following example:

-type data() :: #{a := 1..3, b := 1..3, c := 1..3}.

-spec validate(data()) -> ok.
validate(#{a := 1}) -> ok;
validate(#{b := 2}) -> ok;
validate(#{c := 3}) -> ok.

But apparently this example type checks as is, with no nonexhaustive error, which makes me a bit puzzled about my assumptions. I was thinking it wouldn't. So maybe we can make boolean() work (i.e. refinable), but clearly there's some work around the f(#{bepa := Bepa}) example needed.

zuiderkwast commented 1 year ago

It's never a mistake that #{bepa := Bepa} matches all values for field apa. That's what I think we should assume. It's predictable and clear. I suggest we refine accordingly.

Besides, I think we should normalize the type before calling refinable/3 on it.

zuiderkwast commented 1 year ago

I think it's the same situation for records, the patten #r{bepa = Bepa}.

erszcz commented 1 year ago

It's never a mistake that #{bepa := Bepa} matches all values for field apa. That's what I think we should assume. It's predictable and clear. I suggest we refine accordingly.

The question is when we should not assume that anymore. For example, if one clause matches on #{bepa := false} and other clauses don't mention bepa at all, what then? nonexhaustive error? Or does it count as "match(ing) only those fields that are really needed for that clause" (to quote @xxdavid)?

erszcz commented 1 year ago

Ok, I think I finally have a good example. Let's consider this:

-type state() :: #{completed := boolean(), error := boolean()}.

-spec result(state()) -> completed | error.
result(#{completed := true}) -> completed;
result(#{error := true}) -> error.

On one hand such style of partial functions is common in Erlang / Elixir. On the other, this code raises a nonexhaustive error. I think the error is valuable, as it points at another logical case that's not covered, but it might lead to a lot of warnings on real world code "match(ing) only those fields that are really needed for a clause".

We could fix it like this:

-spec result(state()) -> completed | error | not_ready.
result(#{completed := true}) -> completed;
result(#{error := true}) -> error;
result(_) -> not_ready.

And that's what handle_call usually do in practice - there's usually a catch-all at the bottom handling all unhandled/unexpected messages.

Or like this:

-spec result(state()) -> completed | error | not_ready.
result(#{completed := true}) -> completed;
result(#{error := true}) -> error;
result(_) -> erlang:error(not_ready).

If we prefer the function to remain a partial.

That's what I had in mind when calling it an explicit catch-all.

Now, getting back to:

-type t() :: #{apa := integer(), bepa := boolean()}.

-spec f(t()) -> boolean().
f(#{bepa := Bepa}) ->
    Bepa.

This fails as there's no match on apa value at all. We could assume that there's a catch-all for apa value there. I'd call it an implicit catch-all. We could assume it only for as long as no clause mentions apa at all, as if we assume an implicit catch-all if some clause mentions apa, we effectively disable nonexhaustive checking on apa.

And why isn't it reasonable to add catch-all associations for maps with 10 or 20 keys?

With the distinction of explicit and implicit catch-alls now established - I was thinking of per-field explicit catch-alls. These would be really cumbersome with that number of fields to ignore.

Whew, I hope I managed to make my reasoning clear this time 😆

erszcz commented 1 year ago

This fails as there's no match on apa value at all.

Apparently, there's more to it than that... This fails, too, and I would've assumed it to pass:

-type t() :: #{apa := integer(), bepa := boolean()}.

-spec f(t()) -> boolean().
f(#{apa := _, bepa := true}) ->
    true;
f(#{apa := _, bepa := false}) ->
    false.

Well, it seems there's more work to do to get maps checked properly.

zuiderkwast commented 1 year ago

I think the example with -type state() :: #{completed := boolean(), error := boolean()} is typed in a way that doesn't imply that any of the fields is true, neither intuitively nor logically. It's just like a tuple of two booleans. I think we should treat it in the same way regarding refinement and exhaustiveness checking.

If at least one of the fields is true, it needs to be written something like:

-type state() :: #{completed := true, error := boolean()} |
                 #{completed := boolean(), error := true}.

zuiderkwast commented 1 year ago

Now, getting back to:
-type t() :: #{apa := integer(), bepa := boolean()}.

-spec f(t()) -> boolean().
f(#{bepa := Bepa}) ->
    Bepa.
This fails as there's no match on apa value at all. We could assume that there's a catch-all for apa value there. I'd call it an implicit catch-all. We could assume it only for as long as no clause mentions apa at all, as if we assume an implicit catch-all if some clause mentions apa, we effectively disable nonexhaustive checking on apa.

I disagree. The type system shouldn't assume anything other than what the language actually does. The pattern #{} matches all maps, regardless of keys. If a pattern doesn't mention a key, it's a catch-all for that key. It doesn't matter if other clauses mention that key or not.

It might help to think about #{apa := integer(), bepa := boolean()} as equivalent to {integer(), boolean()}. IMO we should treat them equivalently.

erszcz commented 1 year ago

The type system shouldn't assume anything other than what the language actually does. The pattern #{} matches all maps, regardless of keys. If a pattern doesn't mention a key, it's a catch-all for that key.

Sounds convincing, indeed. I'm not going to argue as my idea was just that - an idea up for discussion.

erszcz commented 1 year ago

If at least one of the fields is true, it needs to be written something like: ...

That's correct, but it ignores the fact that partial functions are used and are common style in Erlang. I intentionally reached for such an example to show what implications this style has on the type checker design, and what the type checker design choices mean for real-world code out there: the character and number of warnings raised.

In general I agree, the type description was imprecise. We can either modify the type or modify the code to have a catch-all.

zuiderkwast commented 1 year ago

Yeah, I was skeptical to exhaustiveness checking already before we started adding it, because Erlang's let-it-crash style suggests you just don't implement cases that shouldn't happen. But then if you want typed code, you should define the types narrow enough that they only describe the cases that should be able to happen. And then exhaustiveness checking makes sense.

But perhaps not always... Do we want to require type assertion here?

%% This intentionally crashes on TCP errors.
%% Exhaustiveness error?
{ok, Bytes} = gen_tcp:recv(Socket, N, Timeout),
foo(Bytes),

%% This is equivalent. Exhaustiveness error?
case gen_tcp:recv(Socket, N, Timeout) of
    {ok, Bytes} ->
        foo(Bytes)
end,

erszcz commented 1 year ago

I'd say this pattern is so common that we don't in the assignment case. After all, this is a dynamic assertion, and gradual typing is about blending both styles to get the best of both worlds. I don't think it would be raised as exhaustiveness is only checked when going over clauses.

In the second case, though... Why use a case if not for matching multiple cases? I think it's fine to require an explicit catch-all that would crash, if that's the intended behaviour.

xxdavid commented 1 year ago

I'm a bit late to the party, but I have many thoughts regarding your discussion.

Exhaustiveness checking is indeed a little controversial in Erlang. Like @erszcz already said, it can be most useful when working with ADTs (i.e., atoms and tagged tuples in the context of Erlang). But as @zuiderkwast mentioned, even in the case of ADTs, it goes somewhat against the let-it-crash style in Erlang. I recently stumbled upon an Elixir code like this:

    case File.write(file, data, [:append, {:encoding, :utf8}]) do
      :ok ->
        Logger.info("Successfully appended to the file.")

      {:error, :enoent} ->
        Logger.error("File #{file} can't be written.")

      {:error, :enospc} ->
        Logger.error("There isn't enough space on the disk.")
    end

The author probably designed the code such that enoent and enospc are acceptable errors but the others are not, and it should crash in these cases. Gradualizer throws an nonexhaustiveness error here. But this could be easily fixed by adding an explicit catch-all clause with an explicit raise

      {:error, error} ->
        raise("Cannot write to file #{file}, got #{inspect(error)}")

And I think every warning for a clause that is intently left out (to let it crash in when this case occurs) can be fixed by this approach. From my point of view, this is an acceptable change in the code compared to the benefits it brings. If anyone does not want to change their code, they are free to disable exhaustiveness checking. So I would not ditch the exhaustiveness checking and would just try to make it work better. And maybe we can turn it off by default if many people dislike it or it does not work well on real-world code.

Now to the gen_tcp example. The following code

-module(tcp_test).

-export([a/3, b/3]).

-spec foo(binary()) -> binary().
foo(Bytes) -> Bytes.

-spec a(gen_tcp:socket(), non_neg_integer(), non_neg_integer()) -> binary().
a(Socket, N, Timeout) ->
    {ok, Bytes} = gen_tcp:recv(Socket, N, Timeout),
    foo(Bytes).

-spec b(gen_tcp:socket(), non_neg_integer(), non_neg_integer()) -> binary().
b(Socket, N, Timeout) ->
    case gen_tcp:recv(Socket, N, Timeout) of
        {ok, Bytes} ->
            foo(Bytes)
    end.

does not throw any exhaustiveness errors. I think it's because the return value of recv can be {ok, binary()} and binary() is not refinable. Here I would like to note that this seems too strict to me. If a type has infinitely many values (like binary() does), it can only be covered by a catch-all clause (i.e., it is covered iff there is a catch-all clause for it). So I think we can do exhaustiveness checking even on such types.

But back to our code. It does not throw any exhaustiveness errors, but it does cause these:

tcp_test.erl: The variable on line 11 at column 9 is expected to have type binary() but it has type HttpPacket_typechecker_3534_1 | <<_:_*8>> | [char()]

a(Socket, N, Timeout) ->
    {ok, Bytes} = gen_tcp:recv(Socket, N, Timeout),
    foo(Bytes).
        ^^^^^

tcp_test.erl: The variable on line 17 at column 17 is expected to have type binary() but it has type HttpPacket_typechecker_3534_2 | <<_:_*8>> | [char()]

    case gen_tcp:recv(Socket, N, Timeout) of
        {ok, Bytes} ->
            foo(Bytes)
                ^^^^^

The return value of a successful gen_tcp:recv can be either a binary() or a string() (or some mysterious HttpPacket), and it depends on how the Socket was set up, so it cannot be described by the type system! To be honest, this particular function has been a reason why I was sceptical about Gradualizer for quite a long time. Only like four months ago, I realised that there is the assert_type macro, and that it is completely justifiable for me to use it in this case (because usually you know how was the Socket set up), given the benefits Gradualizer brings (like in the case of exhaustiveness checking).

Regarding the apa/bepa example. Thanks, @erszcz, for a more detailed explanation of your reasoning, but I am probably still missing something and don't completely understand… I agree with what @zuiderkwast has said, and I would still treat #{bepa := Bepa} as a catch-all for the apa field, regardless of other clauses. The comparison with tuples seems right to me, provided we have some ordering of keys that could appear in the map (based on the type of the value we are matching on). It is not a 1:1 correspondence though, as for example the map() type has infinitely many possible keys and tuples cannot have infinite length.

Also, as @zuiderkwast proposed, calling refinable/3 with a normalised type seems more reasonable to me. But maybe there's a reason why it's not so?

Finally, I also think that the typechecking of maps is somehow broken and it needs some more work/investigation. I'm happy to look at it (although my tempo is not very high), and I'll report if I find anything weird.

erszcz commented 1 year ago

Hey, @xxdavid! Great analysis 👍

Exhaustiveness checking is indeed a little controversial in Erlang. [...] I think every warning for a clause that is intently left out can be fixed by this approach. [...] If anyone does not want to change their code, they are free to disable exhaustiveness checking. So I would not ditch the exhaustiveness checking and would just try to make it work better.

Totally! Opting into exhaustiveness checking means we agree to write code in a certain style because we believe it brings benefits (in readability, early feedback, ease of refactoring). Gradualizer is not unique in that regard - using Credo or even a code formatter in a project also means we opt-in to some style enforced by those tools.

If one doesn't see or need these benefits, they should just be able to opt out and it's fine.

If a type has infinitely many values (like binary() does), it can only be covered by a catch-all clause (i.e., it is covered iff there is a catch-all clause for it). So I think we can do exhaustiveness checking even on such types.

👍

I remember discussing with @zuiderkwast whether we should make all types refinable and the intention was there. refinable/3 as it is now is a filter on types for which we can do it reasonably well already.

The return value of a successful gen_tcp:recv can be either a binary() or a string() [...] [It] is completely justifiable for me to use [the assert macro] in this case [...]

A similar case is re:run() which also takes the intended return type as an argument. I think an assertion or, if possible, a type-safe wrapper (hardcoding the option and respective return type) are the right ways to go.

This is also a good example of why I'm quite stubborn on making Gradualizer self-check. re:run() is used in its code base, exhaustiveness errors do pop up, filename:* functions which also determine the output type based on input types are used, etc. This makes for an excellent exercise in determining which of its features are useful and practical and which are cumbersome (uncovering, for example, that the type representation inherited from erl_parse is a nightmare to work with, but is quite a good benchmark of inconsistencies and usability of the type checker). I believe Gradualizer source itself should be a reference on how to write Erlang in a Gradualizer-friendly style.

I agree with what @zuiderkwast has said, and I would still treat #{bepa := Bepa} as a catch-all for the apa field, regardless of other clauses.

I'm convinced 👍 As I said before, my idea was just an idea up for debate.

Also, as @zuiderkwast proposed, calling refinable/3 with a normalised type seems more reasonable to me. But maybe there's a reason why it's not so?

I'm not sure there's a fundamental reason why it can't be done, but it's tricky because user types are complex, even more so if we consider the expansion of builtin types. Because of that we had trouble with infinite loops in the type checker and it took me a lot of time to finally fix them. Consider:

19> typechecker:normalize(gradualizer:type("string()"), gradualizer:env()).
{type,0,list,[{type,0,char,[]}]}
24> typechecker:normalize(gradualizer:type("[char()]"), gradualizer:env()).
{type,0,list,[{type,0,char,[]}]}

but

25> typechecker:normalize(gradualizer:type("char()"), gradualizer:env()).
{type,0,range,[{integer,0,0},{integer,0,1114111}]}

So not only we would have to call refinable on normalised types, but also call normalize within refinable when we recurse down. It can most likely be sorted out, but I don't see much benefit in it - it doesn't unlock any new possibilities, whereas the current state works.

refinable also short-circuits opaque types (just returns true), whereas normalize translates them to file-annotated user types, so further recognition in refinable would be less convenient (though likely still possible).

Finally, I also think that the typechecking of maps is somehow broken and it needs some more work/investigation. I'm happy to look at it (although my tempo is not very high), and I'll report if I find anything weird.

That's great! I'm happy to help if I'll be able to :)

erszcz commented 1 year ago

@xxdavid I understand that this is completely included in #524, so I'll close this PR. Please let me know if that's not the case.

josefs / Gradualizer

Treat boolean() as refinable #513