Principal Typings type inference

shelby3 commented 7 years ago

@keean has mentioned wanting principal typings in modules and I am asking him to explain.

He has continued to allude to this:

The key for me is to facilitate generic programming. I want it to feel like you are programming a dynamic language like Python, but have the safety of a strongly typed language, and the ability to specialise on type where you need it. So type inference is central to getting the type system out of the way, so it feels dynamically typed. It should be as close to Python duck typing as possible so that people don't feel constrained by the type system, yet it should step in to prevent errors when you make a mistake.

And I thought no principal typings because afaik the meaning of the function definition can morph, which afaics is the antithesis of stable APIs.

keean commented 7 years ago

And I thought no principle typings because afaik the meaning of the function definition can morph, which afaics is the antithesis of stable APIs.

If you have a type signature it cannot morph. So the idea is the language can infer everything so it feels like a dynamic language, but you can choose to put type signatures where you need them to improve readability and safety, and at the minimum all exported functions must have a signature. As a module is the compilation unit, all the compulsory signatures will be in the module definition (which is itself a type). So when you create an implementation of a module (one module can have multiple implementations) you don't have to give any signatures, but you can if you want to. Unlike Rust it will be capable of inferring all typeclass requirements, we are only forcing you to give them explicitly in the module definition to prevent the kind of morphing you are referring to.

shelby3 commented 7 years ago

@keean wrote:

If you have a type signature it cannot morph.

You should really unpack that for readers.

What you apparently mean is that as long as every input argument (and any external references in the containing lexical scope) have a type specified, even if it might be a type variable, then the meaning of the types within the function body can't change.

But afaics, that is an incorrect assumption. If for example, the body of the function accesses a show method on a input argument x: A which has a generic type A, then A has only a structural type and not nominal. Thus the meaning of the body of the function can change depending on the context in which A is inferred external to the function body.

As a module is the compilation unit, all the compulsory signatures will be in the module definition (which is itself a type). So when you create an implementation of a module (one module can have multiple implementations) you don't have to give any signatures, but you can if you want to. Unlike Rust it will be capable of inferring all typeclass requirements, we are only forcing you to give them explicitly in the module definition to prevent the kind of morphing you are referring to.

Examples please. I don't want to try to unpack of all that without some guiding examples to refer to.

Also then I need to consider how your ideas interplay with the soundness of inference, given the higher-order features we want and may want. As you know, moving away from the origin on the Lambda cube can make principal typings inference fail due to the introduction of expansion variables and then two higher-order constructs interplay like two mirrors facing each other and the inference diverges into infinite recursion. You had illustrated this to me during our private discussions in May.

Please put some effort into making this clear when you have time.

skaller commented 7 years ago

Ok, so here is my take on inference. Don't! Type inference is a ver bad idea. It looks cool but it isn't.

Inference makes it hard to understand code because the types are not specified. Types annotations provide redundancy which is essential to comprehension.

The difficulty is not just for the human. The compiler also has difficulty. It is very hard I believe to keep track of how a type is inferred, even if the information is available it may be rather difficult to report it in a sane way. Having the type of a function parameter depend on non-local usage is insanity.

In addition, it is known HM inference is intractable. Exponential or worse performance is hard to obtain and unlikely but possible. But, worse than that, it does not extend to second order polymorphism, and it does not account for other kinds of typing. For example, with Ocaml polymorphic variants, HM inference fails: you have to provide annotations anyhow.

Furthermore, inference stands opposed to overloading which is much more useful.

So I would say, give up on inference at the moment. Mandate type annotations. Think about relieving that requirement LATER not now. You will never get any code written without first fixing some parts of the system to something simple. Once you have sufficient annotations deduction and unification will provide type checking, there is a HUGE amount of work tracing source references in a coherent way so as to make it possible to report type errors.

Do not be confused. The hardest part of a compiler is error handling. MOST of the code will be doing error handling.

keean commented 7 years ago

given the higher-order features we want and may want

Higher order functions will need type annotation. Type inference will only ever infer monomorphic types for arguments.

keean commented 7 years ago

@skaller wrote:

Inference makes it hard to understand code because the types are not specified. Types annotations provide redundancy which is essential to comprehension.

I agree, but languages like Python are popular for a reason. Even C++ does function and assignment type inference.

My conclusion is as long as module boundaries have type signatures we are okay. Nothing is stopping you putting more type signatures in. Maybe all top level functions should have signatures, this can be determined by compiler options.

What inference does is give a unified compositional method for type checking.

Having the type of a function parameter depend on non-local usage is insanity.

My compositional inference algorithm determined principle typings using local knowledge only. All inference is done using an empty environment. The type of two fragments of code when combined depends only on their types and how they are combined, and not the code in the fragments.

Furthermore, inference stands opposed to overloading which is much more useful.

Type classes provide overloading, although you cannot just overload random type signatures. This is good. You should be able to overload things like addition '+' but it should always take two arguments and be associative and commutative or programs become unreadable. If '+' is overloaded to act as 'print' what you have is obfuscated code.

In addition, it is known HM inference is intractable. Exponential or worse performance is hard to obtain

You obviously haven't seen my compositional typing algorithm (on GitHub) which avoids these problems. It's very fast and does not diverge.

So I would say, give up on inference at the moment. Mandate type annotations

I have been working on type systems and inference since about 2004 so it's not like I am just starting with this compiler. I also wrote my first compiler around the same time.

Do not be confused. The hardest part of a compiler is error handling. MOST of the code will be doing error handling.

That depends on how good you want your error messages to be :-)

First you have parser errors - that's not too hard.

Then you have typing errors, again it's not too hard.

After that, if the type system us sound, there should be no need for any more error reporting because the type system will only permit well behaved programs.

skaller commented 7 years ago

C++ does not do any type inference. What it does is type deduction. Deduction is synthetic, that is, it is entirely a bottom up calculation. Inference means that inherited attributes contribute type information, that is, code outside the definition of the term to be typed.

BTW: above you said HOFs always need annotation. This is not correct, they do not in Ocaml. What you probably should have said and may have meant is that if you want second order polymorphism THEN you need annotations.

Type classes do not provide overloading. In principle a function in a type class appears to be polymorphic to the type system during binding (lookup). It is an evil trick, because your code can type check, but fail later in the compilation process, or even at run time (depending on the timing of the second stage binding, which could even be done at run time using plugins in which case the failure can't occur until run time: Fortress does that).

Type classes could be said to provide ad-hoc polymorphism, but that's not the same as overloading, which involves selection of a function based on the type during binding.

BTW: I don't agree about +, its really hard. Many + are not associative or symmetric. Floating point addition is symmetric but not associative. Non-throwing integer addition is neither. Unsigned integer addition is both. There is a lack of symbols in Ascii. Felix also allows TeX/LaTeX/AMSTeX symbols but there are still not enough, on the other hand if there were any more, you'd not remember what they meant :-)

skaller commented 7 years ago

BTW: if you think the only errors you can get in a compiler are parse errors or type errors .. you have no idea. Such a compiler would be useless. For a start you want type classes but you missed out on the fact that an instance may not be available that is required. Or there may be two instances. You only learn that well after type checking.

There are also a lot of non-type checks that can be done. The idea that a well typed program must be well behaved is ridiculous. Never deref a NULL pointer in C? It depends on how expressive the type system is how many errors can be detected early. For example without some dependent typing you can get a run time error zipping two lists. Without proper array types such as both Felix and C++ have, a similar result for zipping arrays.

You can also check things like purity of functions, if there is a specifier to say to do it or a language rule, that are not encoded in the type system.

keean commented 7 years ago

BTW: if you think the only errors you can get in a compiler are parse errors or type errors .. you have no idea. Such a compiler would be useless. For a start you want type classes but you missed out on the fact that an instance may not be available that is required. Or there may be two instances. You only learn that well after type checking.

The type checker catches this.

BTW: above you said HOFs always need annotation. This is not correct, they do not in Ocaml. What you probably should have said and may have meant is that if you want second order polymorphism THEN you need annotations.

Well I did say inferred types.will be monomorphic, so yes this is right.

There are also a lot of non-type checks that can be done. The idea that a well typed program must be well behaved is ridiculous. Never deref a NULL pointer in C? It depends on how expressive the type system is how many errors can be detected early. For example without some dependent typing you can get a run time error zipping two lists. Without proper array types such as both Felix and C++ have, a similar result for zipping arrays.

A well designed type system does all the checks you need. It is true many languages like C++ do not have a powerful enough type system to do this, but that is not the kind of language I want to design.

Type classes do not provide overloading. In principle a function in a type class appears to be polymorphic to the type system during binding (lookup). It is an evil trick, because your code can type check, but fail later in the compilation process, or even at run time

I disagree with this. There is no evil trick, specialisation means the instance implements the type-class signature.

You can also check things like purity of functions, if there is a specifier to say to do it or a language rule, that are not encoded in the type system.

If you want this the type system can enforce purity.

shelby3 commented 7 years ago

@keean wrote:

My conclusion is as long as module boundaries have type signatures we are okay.

So you are proposing principal typings only within modules and not between modules?

shelby3 commented 7 years ago

@skaller wrote:

BTW: above you said HOFs always need annotation. This is not correct, they do not in Ocaml. What you probably should have said and may have meant is that if you want second order polymorphism THEN you need annotations.

Which is what I wrote in a more general way...

As you know, moving away from the origin on the Lambda cube can make principle typings inference fail due

@skaller wrote:

Type classes could be said to provide ad-hoc polymorphism, but that's not the same as overloading, which involves selection of a function based on the type during binding.

Afaik, that is what typeclasses do, selecting a set of implementation functions based on the type. Perhaps you are conflating with existential forall a which is runtime dynamic lookup.

BTW: I don't agree about +, its really hard. Many + are not associative or symmetric. Floating point addition is symmetric but not associative. Non-throwing integer addition is neither. Unsigned integer addition is both

I am discussing with you on that point in the other issue thread #22.

keean commented 7 years ago

So you are proposing principle typings only within modules and not between modules?

First principal typings and principal types are different things.

principal types mean there is one and only one most general type for a fragment of code. This is fine, as you can always give a more specific type as the signature. To give a type that is not subsumed by the most general type would be a type error any way you look at it.
principal typings are principal types that contain all the information necessary to compose program fragments, so that their types compose without having to re-inspect the already typed code.

shelby3 commented 7 years ago

@keean wrote:

First principal typings and principal types are different things.

Why are you writing this as if you think I don't know that?

Did you forget that 12 days ago I cited Jim Trevor's research paper which explains the difference between principal types and typings?

keean commented 7 years ago

Why are you writing this as if you think I don't know that?

Because you thought that type inference of principal typings would result in types morphing, or an inability to give function signatures demonstrated a lack of understanding.

shelby3 commented 7 years ago

@keean wrote:

Because you thought that type inference of principal typings would result in types morphing,

It will if you do principal typings external to modules, and you keep making this same mistake. So please don't put your incorrect thinking as an indictment of me.

or an inability to give function signatures demonstrated a lack of understanding.

I never thought there was inability to combine some function signatures with principal typing (although I am not speaking for whether it will diverge). I was asking you to clarify what you intended to do. Apparently now you have clarified that you are proposing only to allow the elision of types within modules and not at module boundaries.

keean commented 7 years ago

Apparently now you have clarified that you are proposing only to allow the elision of types within modules and not at module boundaries.

Yes I said that days ago.

shelby3 commented 7 years ago

@keean wrote:

Yes I said that days ago.

Buried as a single sentence in a discussion on another topic in another thread. I started this thread to have you on record with clarity.

Also when you mentioned as an aside days ago, you did not say that all types would be explicit the module "boundary". You used that key word "boundaries" in this thread.

keean commented 7 years ago

So do we agree it is okay to infer principal typings, so that code can be reasoned about locally, with compulsory type annotations at module boundaries?

shelby3 commented 7 years ago

@keean as a goal I agree, but it depends how difficult that goal is to achieve and what we have to sacrifice to get it. I'd be willing to backtrack to only inference within function bodies if the module boundary becomes too onerous in some way.

keean commented 7 years ago

@shelby3 I agree, if there turns out to be some problem due to the combination of types then we can go to all top-level functions must have a type, but I don't see any problems so far, as top-level functions will just be local functions to some exported function in the module interface. We are more likely to either be able to infer types for all non-exported functions, or require type-signatures for all functions I think.

shelby3 commented 7 years ago

So I agree you can close this as "Future Work".

shelby3 commented 7 years ago

@keean again I am growing concerned that you may be trying to shoehorn too many expectations into what inference will be able to ultimately do, especially after we add higher-order features (or you may block those features on grounds that they break inference).

I see inference without principal typings (as @skaller says "deduction") as essential inside function bodies, and I would like principal typings on functions local to a function body, but outside of that, I am not sure it is desirable if it causes other significant tradeoffs. But I am open-minded. Just don't want to place unnecessary expectations on inference.

keean commented 7 years ago

@shelby3 your concerns are noted, however if inference can't do something you can always add annotations. I'm not worried, I have done this before :-)

shelby3 commented 7 years ago

@shelby wrote:

@keean again I am growing concerned that you may be trying to shoehorn too many expectations into what inference will be able to ultimately do, especially after we add higher-order features (or you may block those features on grounds that they break inference).

Example of my expectation possibly coming true.

keean / zenscript

Principal Typings type inference #21