keean / zenscript

A trait based language that compiles to JavaScript
MIT License
42 stars 7 forks source link

OCaml's functors vs. typeclasses #22

Open shelby3 opened 8 years ago

shelby3 commented 8 years ago

@skaller wrote:

[There is an unrelated question to the syntax: do you REALLY want type classes? Ocaml style modules are more powerful but harder to use. Of course you can have both. Type classes are probably OK for scripting languages, that's why I chose them for Felix, but they have limitations.]

I wrote about this 6 days ago, and I think you are referring to the fact that OCaml's unique first-class functors recently have added implicit selection, so combined with modules this can apparently emulate typeclasses and more.

Could you explain about the limitations?

shelby3 commented 8 years ago

@keean wrote:

sort_n(f : I, n : DistanceType<I>, r : R)
    requires Mutable<I>, ForwardIterator<I>, Relation<R>,
    ValueType<I> == Domain<R>

With your way of structuring sort, my idea applies to for example being able to specify the implementation of Relation orthogonally at earlier binding in different modules than the other typeclasses which could be bound later. I believe your structuring avoids the T<A> that I had in my structuring, but otherwise my same concept applies. Your structuring seems to bind the chosen algorithm to the abstraction; whereas, mine doesn't (i.e. the implemention of mine could be implemented with enumerables or iterators, but yours commits to mutable form of bubblesort instead of immutable mergesort).

skaller commented 8 years ago

The categorical view of the world doesn't impact how you do type checking as such, it influences your language design. It tells you for example, having "interfaces for types" is a wrong idea. It tells you a type class has some types A,B,C ... and functions between them, that no types are special, because the types only serve to specify function domains and codomains.

It tells you C++ classes and virtual functions and that kind of object oriented design is a special case, not a general paradigm, even before you see the proof that it cannot work in general due to type violations.

It also hints that applicative programming is suspect. The idea is based on set theory: a type is a set of values, a function is applied to a value to yield another values. Category theory says no, there are no values, only functions, there is no application, only composition. In Haskell they say "point free" meaning "without variables", or "entirely in terms of functions".

You can recover application. For example given the type 1, also called unit, of an empty tuple (), and consider the type int. Then where previously you have a value 42 in the set int, and a function sqr to square integers, incr to add 1

42 . sqr . incr 

in reverse application notation, you can eliminate the value terms except () by

() . (fourtytwo |-> sqr |-> incr)

where |-> means reverse composition. (Just inventing some notation for demo). So here

fourtytwo: 1 -> int

is the unique function with unit argument () to int that "picks out" the value 42, i,e, fourtytwo () = 42. Constant function of "no arguments" if you like. The point is once you have such a function you can forget the unit value too and now your combinator is associative, and your domain of operation consists entirely of functions. (Well known example language: Forth).

The point of seeing the idea is that, since you only have functions and do composition instead of application .. there are no values. And so what role do types have???

In fact, you do not need types at all! They're a convenient label of no significance to write

f: A -> B
g: B -> C

so now you know that f |-> g is well defined, because the codomain of f is the same type as the domain of g. Clearly you don't actually need any types to check the composition is well defined, you could use a table of well defined function compositions instead, the type notation is just familiar.

The point is, sure, yes, you still do type checking. But you do not think about types as having any meaning. They're just labels on the "joining points" of functions. A product type such as A * B tells you there are two functions which satisfy the universal law for products, you probably understand cartesian product already in terms of the idea that the "projections are orthogonal" but now you completely drop the idea that a product has a meaning as a pair of values.

Similarly, you may think that in functional programming, functions can be values. This is utterly wrong. It is not true. If you think that way you get a serious design fault, and Ocaml has that design fault because of wrong thinking. Functions are arrows in a category. What you think of as a function value, or closure, is not the same thing at all.

You can see this writing ordinary code! Even in C. The expression f (x) means to apply a known function to x. Your compiler will definitely have a special instruction for this, in Felix it is called "apply_direct". If you only have a closure, it is "apply_indirect" and the first argument is an expression not a function. Totally different: can't be optimised, you only know the type. And you will have another operator "make_closure" which converts a known function to a closure.

Why do these operators exist? Because functions are not the same as closures, even though the distinction is glossed over in many concrete syntaxes. It tells you that in Ocaml the construction:

let f x  = x * x;;

is a disaster, because f is now a function value not a function. Which is why Ocaml is hopeless dealing with cross module recursion, whereas C can do it in its sleep. Design fault. Functions are NOT the same as function closures. In Felix the distinction is extremely clear: a function in Felix is a C++ class. A closure is a value of the C++ class. Big difference between a class and a value, eh?

So roughly, its a conceptual change: stop thinking about values, start thinking about functions. Stop thinking types mean anything more than what functions they connect. This is not to say don't use types, just think about them as what they really are: meaningless labels.

So when you think about "type classes" and "interfaces" and stuff think that the "type parameters" are just an arbitrary collection of labels used to specify how the functions connect:

class ForwardIterator[I, V]
   virtual fun next: I -> I
   virtual fun get: I -> V
   virtual fun atend: I -> bool

This is a purely functional forward iterator. It is clearly an abstraction of a list:

instance[V] ForwardIterator[list[V], V] 
  fun next (x: list[V]) => match x with
    | Cons (head, tail )=> tail
    | Empty => halt "Empty List!"
  fun get (x:list[V]) => match x with
    | Cons (head, tail) => head
    | Empty => halt "Empty List!"
 fun atend (x:list[V]) => match x with
    | Cons _ => false
    | Empty => true

Note the instance is polymorphic in the value type still, but the iterator type is now fixed to be a list functor applied to the value type. The point is that the abstraction admits other functions than the three I gave for list, obviously you can do it for arrays and such like. The critical thing is that you're talking about three functions. The type class isn't specifying "interfaces for types", the types just say how the functions compose.

This particular type class is not good, because it does not specify any semantics. There should REALLY be axioms that say that applying next or get is only meaningful if atend is not true.

Although commonly the value type of an iterator depends on the iterator, it does not have to depend in any well defined way. This is wrong thinking again. The types are just labels. I can easily change the get function to contain this:

| Cons (head, tail) => head.str

Now the value type is string. I could also put this:

| Cons (head, tail) => head.str + "!!!!"

and now type classes are screwed because that is a distinct get function, even though the type is the same. So technically, you should label the instances so you can chose one of the many possible instances for a given pair of iterator/value type pairs. As I mentioned earlier in the common workaround for this design fault is to label one of the types instead (remember Down to label integers as sorting descending instead of ascending?)

Really at this point, you see you need modules and functions like in Ocaml because the application of the functor to a module, yielding a module, has to be named, which solves the problem.

In practice you do type classes ANYHOW because they're easier to use. But you have to recognise they're a feature for representing concepts in the language YOU the language designer feel are fairly fundamental, such as iterators, and you just accept that you're fixing a particular kind of way of doing things: type classes basically allow you to invent systematic notation systems, but you only get to use the notation one way, so you use them with care to pick the right use for your language.

skaller commented 8 years ago

@keean Let me see if i understand what you mean by an output type: you would write something like

class ForwardIterator [I => V]
   virtual fun get: I -> V

to say that V is a type label to use in a function signatures, but it is "dependent on I". Then you have to specify it in your instance:

instance ForwardIterator[list[int] => int]
   fun get (x: list[int]) : int => ....

Apart from my notation, is that the idea? The instance supplies, not only the functions, but it also supplies output types as well?

Obviously, this is even more constrained than if the type V above were a parameter, but as we know type classes are even more constrained than modules and functors because you cannot index them on functions, whereas you can do that with modules/functors. So the additional constraint weakens the definition even further, but such weakening also has the advantage of reducing even more the amount of specification required to use them.

i hope I understand. Felix doesn't have such dependent types and not having them means some constructions with type classes require manually adding the dependent type to the input list, which destroys the convenience of the notation.

Do I understand correctly?

keean commented 8 years ago

@skaller wrote:

Do I understand correctly?

Yes the output type is functionally dependent on the input types. It is weaker than giving both as a parameter, but that is not what you want when programming. Given a collection of type 'x', I want to derive an iterator from the collection that iterates over the type of objects in the collection. Hence the iterator should determine the type of object.

Its not just about ease of use, its about getting the algebra of collections and iterators to have the desired properties.

skaller commented 8 years ago

Let me try to understand how the typing works. Consider first (Felix again sorry):

class Fwd[I,V] { virtual fun get: I -> V; }
fun f[V,I with Fwd[I,V]] (x:I) : V = { var x : V = get I; return x;}

Since V is specified as a parameter, we can type check the assignment and the function return type.

However this code has a serious problem in Felix! The procedure argument has type I: we cannot deduce the type V. So this use:

... f it ...

will fail because V cannot be deduced. We have to write instead

.... f[int] it ...

for example, to specify the value type (the iterator type is deduced). We have to do this because there is no constraint on the value type, it could be anything. Now clearly the dependent output type solves that problem, because you can now call f, and the iterator type I is deduced from the iterator argument.

But what is the return type of f? That's a problem! It's not enough to know that it depends on I.

So this is a real problem as I see it. We only know the type when monomorphisation selects the instance which then specifies the type. It's not good enough, because we bind the code polymorphically first. We can't type check. Felix does polymorphic type checking, instance binding happens later during monomorphisation. That is, the type checker just assumes the type class functions are parametrically polymorphic, even though this is a lie because only selected specialisations are actually defined in instances.

I'm confused. I do not see how this can work. The problem is that the "functional dependence" of V on I is not accessible in the type class, that is the type function that calculates the type of V does not exist there: that function is the map from I to the type class instance settings of V for that I (for each concrete value of I). But when binding we do not consider instances. That happens later. The type checking is already complete.

I have to know the type of every expression in Felix, because Felix has overloading. For any expression e, and a set of overloaded functions f, the choice of which f is made in an expression f e based on the type of e. The choice cannot be deferred. We have to make the choice immediately, to determine the type of f e.

This is not true if you do not have overloading. You can defer the choice and use type inference to fill in the type variables later, because you only have one function f: you can just give the most general possible type and specialise it as you get more information. At least my algorithm for overloading cannot do that.

shelby3 commented 8 years ago

@skaller wrote:

that no types are special, because the types only serve to specify function domains and codomains.

It seems to me we use types because it is a model in which we want to think when doing composition. We wish to think in terms of data, not in terms of relations of morphisms.

Because it seems without types, we have a huge amorphous system of simultaneous equations of domains and codomains. Instead we create a model with the rigidity of types to provide comprehensible shapes so we may apply local reasoning. This has the disadvantage of reducing the degrees-of-freedom, whilst the advantages of approachable reasoning and feasible type inference. Category theory can only takes us so far, because humans can't program coherently in that highly abstract model of multiple orders of lifting.

Sorry but if only 0.1% of the programmers can understand your code, then it is useless on any scale of mainstream open source (although might be acceptable for an open source niche such as mathematics).

keean commented 8 years ago

@skaller The functional dependence is on the type. If we have a collection of type Array<Int>, then the iterator for this collection has a type that is parametric on the type contained in the collection. The instance of iterator for the collection has a free type variable. Something like this:

typeclass Iterator
    data ValueType
    ...

Array<T> implements Iterator
    data ValueType = T
    ...

So the type of ValueType only depends on the type of the collection being iterated over, but that does not need to be a parameter of the Iterator type class.

shelby3 commented 8 years ago

@keean I think @skaller is coming at this from the model perspective of wanting types to not provide any rigidity and instead be entirely flexible to what function compositions proscribe.

keean commented 8 years ago

@skaller wrote:

for example, to specify the value type (the iterator type is deduced). We have to do this because there is no constraint on the value type, it could be anything.

Maybe I am misunderstanding, but this seems like a problem to me. If I have a collection of Int and I want to run some function on an iterated value, what happens if I put f[float]? Presumably that's a type error. If you can statically detect the type error, then clearly you already know that the allowable type is f[int], so why not infer it? Unless you can only detect that error at runtime, which would make the type system unsound.

shelby3 commented 8 years ago

@shelby3 wrote:

@skaller math doesn't come in only one conceptual abstraction. There is more than one way to skin a cat. Type theory is not the same as category theory. We create type systems and syntactic conventions that optimize the programmer experience, not the arbitrary chosen mathematical abstraction. That is not to claim that there isn't more degrees-of-freedom in a category theory model with only functions as nodes on a graph and types as graph edges. @keean's typing model is types as nodes on the graph and edges between them for inference.

Edit: I would like to make the analogy to building houses with grains of sand or with windows, bricks, doors, etc.. The most elemental model @skaller prefers is also very amorphous.

shelby3 commented 8 years ago

I feel I don't understand OCaml's functors well enough to understand what @skaller thinks we could do better? And I didn't get any feedback on my idea upthread.

Could we please get some clear elucidation from you experts on whether we need some form of later or variable selectable binding of implementation to data type?

I want to nail down the syntax and it seems the choice of syntax for typeclasses may depend on whether we need such a feature.

keean commented 8 years ago

@shelby3 Did you see my post about records providing some of the functionality, and if you use records, functors become ordinary functions (which is good in my opinion as why create a new category of functions that are somehow different from ordinary functions). Records have type parameters like modules.

If records have associated types and private functions in the implementations (can easily be done by returning a record instance from a closure) we just need to make opaque types a global feature and I think we can do anything the OCaml module system can do with records. It would also remove the separate syntax for modules, as they would simply be records.

And guess what, union types solve the module typing issue too, as the type of an associated type would be the union of all the different associated type definitions in the record values. I am not yet sure whether we can infer this type, or if you would have to list the possible types in a union in the record type.

shelby3 commented 8 years ago

@keean I don't know what "associated type" is? Need to see code.

I don't understand how records are applicable to late or selectable binding of implementations of typeclasses?

If records have associated types and private functions in the implementations (can easily be done by returning a record instance from a closure) we just need to make opaque types a global feature

I have no idea what all that means. Could you translate it to a code example or otherwise explain it for a mere n00b like myself?

Basically eveything you wrote is meaningless to me because I can't relate the terminology to anything concrete.

keean commented 8 years ago

@shelby3 An associated type is one where the type depends on the type parameters of the record, module or type class. consider:

data Rec A = Rec ...
    type ValueType : * 

The type of ValueType is completely determined by the type of a

This means we might do:

let x : Rec Int = Rec ...
    type ValueType = String

So what this means is that the record has a property called ValueType which is a type, not a value;

Regarding late binding, we can have a record like this:

data Number<A> = Number ...
    (+) (A, A) : A
    (-) (A, A) : A
    (*) (A, A) : A
    (/) (A, A) : A

We can define a value of this record type:

let integer : Number<Int> = Number ...
    (+) (x, y) = ...
    (-)  (x ,y) = ...
    (-)  (x, y) = ...
    (/) (x, y) = ...

We can then pass this to a function as a first class value:

f(z : Number<A>) : Number<A> =>
    5 z.+ 6

So we can pass in any record instance that provides functions that match the Record type, can then call them from within the function.

shelby3 commented 8 years ago

@keean wrote:

data Rec A = Rec ...
    type ValueType : * 

The type of ValueType is completely determined by the type of a

Do you have a typo? Where is a defined?

shelby3 commented 8 years ago

@keean wrote:

Regarding late binding, we can have a record like this:

data Number<A> = Number ...
    (+) (A, A) : A
    (-) (A, A) : A
    (*) (A, A) : A
    (/) (A, A) : A

That binds interface type to data type which breaks any solution to the Expression Problem.

We can then pass this to a function as a first class value:

f(z : Number<A>) : Number<A> =>
    5 z.+ 6

So we can pass in any record instance that provides functions that match the record type, can then call them from within the function.

That does not provide late binding of selecting which one of multiple implementations of an interface we want for a particular data type.

keean commented 8 years ago

That does not provide late binding of selecting which one of multiple implementations of an interface we want for a particular data type.

The function called is determined by the record passed at runtime, which is as late as you can get.

Do you have a typo? Where is a defined?

No the type of ValueType is defined in the implementation Different implementations for different types can provide different definitions for ValueType.

In the lambda cube this is a type dependent type. We want all corners of the lambda cube except types that depend on values.

shelby3 commented 8 years ago

@keean wrote:

No the type of ValueType is defined in the implementation

I know that the type unification is in the implementation. I meant where is a in your definition. You had instead A and *.

shelby3 commented 8 years ago

@keean wrote:

The function called is determined by the record passed at runtime, which is as late as you can get.

The selection of the data type is late, but the binding of the interface to the data type is early. And I am repeating myself.

Fundamentally you do not seem to maintain a conceptualization of the Expression Problem, and I often have to reexplain it. I will be making a long comment to explain all of this including what I think @skaller wants.

keean commented 8 years ago

The selection of the data type is late, but the binding of the interface to the data type is early. And I am repeating myself.

This is true, but this is how modules and functors in OCaml work.

I believe I understand what you have in mind for the expression problem. If we have a polymorphic function on type A like

f(x : A) where Show<A>

and we call this from something where the type is unknown until runtime like:

var x
if random ...
    x = "Hello"
else ...
    x = 43
f(x)

Then although the type of x is not determined until runtime we can construct a union type for x that would be Int | String. So we know that when typing f we have that Int | String must implement Show, which means both Int and String must implement Show, and at runtime we must pass both dictionaries. Then finally at runtime when the call to show happens in f the correct dictionary is chosen based on the type argument passed to f. As we know all the types in the union type, the compiler can emit a complete type case statement that is type-safe because it cannot go wrong (receive a type it is not expecting) at runtime.

However we will need run-time-type-information for the types in the union to implement the type-case (I know we are both aware of that, but I am putting it in for completeness).

The above could work with modules, but you would have to pass a set of modules in that match the types in the union and do the type-case manually, which is lots of extra boilerplate, but you can do this already with the records we have.

keean commented 8 years ago

refresh

shelby3 commented 8 years ago

@shelby3 wrote:

Fundamentally you do not seem to maintain a conceptualization of the Expression Problem, and I often have to reexplain it. I will be making a long comment to explain all of this including what I think @skaller wants.

Done.

shelby3 commented 8 years ago

@shelby3 wrote:

module First {
   typeclass LessThan<A>
      less(A, A): Bool
   typeclass Sort<T<A>> extends LessThan<A>
      sort(T<A>): T<A>

   g(x: T<A>): T<A> where Sort<T<A>> => x.sort()
}

I realize the above is not @keean's preferred form. Any way that could be written instead (with a correction above as well):

module First {
   typeclass LessThan<A>
      less(A, A): Bool
   typeclass Sort<T<A: LessThan>>
      sort(T<A>): T<A>

   g(x: T<A>): T<A> where T<A>:Sort => x.sort()
}

I was trying to decide if I had to abandon the OOPish form of typeclass specification and I don't think so.

shelby3 commented 7 years ago

@skaller wrote:

The idea is based on set theory: a type is a set of values, a function is applied to a value to yield another values. Category theory says no, there are no values, only functions, there is no application, only composition.

But without types, then we do not have the foundational axioms for logic. I quote from Wikipedia:

The heart of Curry's paradox is that untyped lambda calculus is unsound as a deductive system, and the Y combinator demonstrates that by allowing an anonymous expression to represent zero, or even many values. This is inconsistent in mathematical logic.

sighoya commented 6 years ago

@skaller wrote:

This particular type class is not good, because it does not specify any semantics. There should REALLY be axioms that say that applying next or get is only meaningful if atend is not true.

I'am with you with semantic specifications. But are semantic specifications easily verifiable? IMHO, if you define laws for an operator/function which operates on the instance of a typeclass, then the law has to be checked for all values of that instance. For example: Int implementing Eq[Int], then reflexivity, symmetry and transitivity has to be verified/checked for all values in Int for each function/operator parameter which should slow down the compiler, right?