Orthogonal concepts - Githubissues

shelby3 commented 7 years ago

I am trying to nail down in my mind, the fundamental orthogonal semantic concepts of our proposed programming language.

Concept	Description
`data`	Unified sum, product, recursive, and record data types. No early bound operations other than construction and access. Record types may optionally be mutable.
`typeclass`	Delayed binding of operations to types at the use-site. Contrast to OOP (subclassing) which binds operations at construction of objects. For polymorphic types, typeclass objects delays binding operations until construction of the typeclass object (instead of prematurely binding operations at the construction of an instance of the implementing type); whereas, my posited and proposed solution to the Expression Problem employing unions, in theory further delays binding to the use-site for polymorphic types. Note that delaying is a positive attribute in this context, because it increases degrees-of-freedom.
`module`	Encapsulating `data` with compile-time (statically) bound operations and access control, enabling more sophisticated types.
monadic effect system	https://github.com/keean/zenscript/issues/4#issuecomment-248569044

Thus I am wondering if module is the most apt name for the concept? I think of a module as a unit of code that is imported separately. We still need to import data and typeclass, so is this done by files? Isn't each file then a module? If so, what is the apt name for the concept of module above? I am thinking the above concept for module is really a class (without the subclassing inheritance anti-pattern). Can't modules extend other modules?

Edit: note also OCaml functors and modules.

Edit#2: on typeclasses, I replied to @keean:

Optional implicit parameters (which Scala has) provide a way of unifying runtime interfaces (passed by value) and monomorphisable interfaces (passed implicitly by type).

I like that summary (in the context of what I had written). We should make sure we frame that and use it in our documentation.

Edit#3: @shelby3 wrote:

I think it is important to remember that we want to keep implementation of interface (i.e. typeclass) orthogonal to instantiation of data, otherwise we end up with the early binding of subclassing. For example, in subclassing both a rectangle and a square would be subclasses of an interface which provides width and height. Yet a square only has one dimension, not two. Any interfaces which need to operate on the dimensions of rectange and square need to be customized for each, which is what typeclasses do. We will still have subtyping such as when an interface extends another interface then we can subsume to the supertype (which is why we need read-only references and for supersuming then write-only references). Yet references will not have the type of an interface if we don't provide typeclass objects, so then the only case of subtyping will be the subset relationships created by conjunctions and disjunctions of data types.

keean commented 7 years ago

Yes, I think that is correct. We basically split objects into three orthogonal concepts, data are effectively object properties, typeclass are inferfaces, an modules control data hiding (privite data etc).

Note: Elsewhere I proposed that we can combine the three into a single syntax and generic concept, but lets leave that to one side for now.

shelby3 commented 7 years ago

Did you see my edits pointing out that 'module' has two meanings? Should we use another keyword than module?

keean commented 7 years ago

Extension and functors are something ML supports on modules, at the moment I don't think we should. I like the ability to have multiple modules in one file, but effectively modules are able to be separately compiled and linked later. In the simplest form they are like namespaces where you cannot access their private data from outside. A module has to export things to be visible outside, and import the external functions and data in order for them to be visible inside.

shelby3 commented 7 years ago

Note I added the idea for a monadic effect system. Well I guess it isn't orthogonal in some sense as its implementation relies I suppose on the other 2 or 3 concepts, but the systemic model for handling of impurity is an orthogonal concept.

shelby3 commented 7 years ago

@shelby3 wrote:

Concept Description

typeclass Delayed binding of operations to types at the use-site. Contrast to OOP (subclassing) which binds operations at construction of objects. For polymorphic types, typeclass objects delays binding operations until construction of the typeclass object (instead of prematurely binding operations at the construction of an instance of the implementing type); whereas, my posited and proposed solution to the Expression Problem employing unions, in theory further delays binding to the use-site for polymorphic types. Note that delaying is a positive attribute in this context, because it increases degrees-of-freedom.

Concept	Description
`typeclass`	Delayed binding of operations to types at the use-site. Contrast to OOP (subclassing) which binds operations at construction of objects. For polymorphic types, typeclass objects delays binding operations until construction of the typeclass object (instead of prematurely binding operations at the construction of an instance of the implementing type); whereas, my posited and proposed solution to the Expression Problem employing unions, in theory further delays binding to the use-site for polymorphic types. Note that delaying is a positive attribute in this context, because it increases degrees-of-freedom.

I want to raise this conceptualization of solving the Expression Problem to a matrix, so that we can how this plays out on more levels.

Background

Dynamic polymorphism is instituted whenever we bind the interface to an object and refer to the type of that object as the interface; Because whether it be at data type object instantiation for subclassing or typeclass object instantiation (aka Rust trait objects), we then must incur dynamic dispatch because when holding a reference of said interface type at compile-time, there is no static (compile-time) knowledge of the types of data that can be referenced with the said interface type (rather the runtime dynamic dispatch handles the dynamic polymorphism). In subclassing the interface is a supertype and in typeclass objects, the interface is the typeclass bound (constraint) on data types that can be assigned to a reference with said type.

So the lesson that once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function. More importantly, we lose the axis of the Expression Problem to add new interfaces (operations) to the data types referenced via an interface type. We can add new data types but can't add new operations.

Subclassing binds the interface very early at data type instantiation. Typeclass objects bind later at typeclass object instantiation. And typeclass bounds bind even later at the function call site.

So ideally, we wanted to always delay to typeclass bounds, but this conflicts with polymorphism of adding data types orthogonally when dealing with heterogeneous union types. Thus I pointed out that by retaining the union of data types instead of prematurely subsuming them to the typeclass bound of a typeclass object, then we could delay the binding to the function call site. Thus we can continue to add new orthogonal operations to the union type, unlike for typeclass objects. The trade-off is that adding a new type to a union type requires invariant collections (such as a List); whereas, typeclass objects will interopt fine with arrays because the type of the array doesn't need to change when adding new data types to an exist typeclass object type. I explained this is more detail in prior linked discussion. Readers can refer to the prior explanation, so I don't need to repeat it here.

Matrix Conceptualization

So what we can see is there is a tension with trade-offs between the competing choices of carrying around the specific data types and subsuming to the interface. So both are needed for different situations.

Recently @skaller pointed out another axis of this tension. That is in the case where don't want to discard the data types because we want to leave them open to new operations, but we want to dictate that a specific implementation is used for a specific typeclass where ever it might be needed. Apparently OCaml's functors accomplish this by being first-class and one can pass these along with a module and they will morph the module with the specific constraints desired, while leaving the rest of the module unspecialized, and thus through the chaining of functors, one can get the same effective result as early binding for one typeclass but remaining open to extension for other typeclasses.

So really what we need is to being able to attaching a typeclass object to a data type instance and pass them around as a pair every where a data type is expected. So in other words, dynamic patching of the dictionary for the dynamic dispatch, wherein if that data type is then applied to a trait object at a call site or trait object assignment, then the prior binding is not overwritten and any additional dynamic dispatch bindings are augmented.

So then we no longer view data types and typeclass objects separately. A data type always has a dynamic dictionary attached to it, yet it is augmented at runtime, not with compile-time knowledge. The compiler can only guarantee the required typeclass operations will be available at runtime, but it can't guarantee which implementation was selected, because at runtime when it writes the implementation to the dynamic dictionary of the typeclass object, it can't know if a preexisting one will already exist.

So we have lifted dynamic polymorphism to a 2nd order. We have dynamism of dynamic dispatch.

The matrix is we have two axes of data type and interface, and a third axis of implementations that cojoin the two. This third axis can't be tracked at compile-time without dependent typing, but the compiler only needs to insure that the required interfaces are present in order to insure soundness. So this works.

What this enables is for example @skaller's example of wanting to set the ordering of comparison operations, while leaving the data type open to extension with other typeclasses. @keean's proposed solution also achieves it, but it lacks atomicity of retaining this with the data type. For example, @keean's solution is not generic in type of algorithm that can be applied ex post facto and thus it fails another axis of the tension of solving the Expression Problem.

What we will see generally is that higher order dynamism is required to solve higher order tensions in the Expression Problem.

keean commented 7 years ago

So the lesson that once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.

This is not quite right. We can still mono-morphise even with compile-time interface binding - in fact for the traditional requirement this is a requirement. The point is we can statically know both the interface and the exact type passed, and therefore chose and inline the exact function. Rust does this in all cases exept when dynamic polymorphism in involved.

So we can inline and monomorphise the functions when the interface is statically bound and the type is statically know. If either are false (so dynamic binding to the interface, or dynamic typing) we cannot monomorphise in languages like Rust.

keean commented 7 years ago

So really what we need is to being able to attaching a typeclass object to a data type instance and pass them around as a pair every where a data type is expected.

This is exactly what adding functions to an objects prototype does in JavaScript.

shelby3 commented 7 years ago

@keean wrote:

So really what we need is to being able to attaching a typeclass object to a data type instance and pass them around as a pair every where a data type is expected.

This is exactly what adding functions to an objects prototype does in JavaScript.

No it is not the same. The JavaScript prototype impacts all instances. I am referring to impacting only instances which are assigned to the trait bound.

@keean I have decided that it is nearly impossible to try to explain these issues like this. I will need to make detailed code examples and write a white paper. You don't seem to understand me.

keean commented 7 years ago

No it is not the same. The prototype impacts all instances. I am referring to impacting only instances which are assigned to the trait bound.

So you want the implementation used to depend on the value of the object not just its type.

This is monkey-patching, adding a method directly to the object itself.

shelby3 commented 7 years ago

@shelby3 wrote:

So the lesson that once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.

This is not quite right.

It is right. I just haven't been able to communicate my points to you.

keean commented 7 years ago

You have written it badly then because

once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.

This is exactly the opposite of what is true.

We must bind the interface statically in order to be able to monomorphise and we must statically know the type at the call site as well.

shelby3 commented 7 years ago

@keean wrote:

The point is we can statically know both the interface and the exact type passed

If you had understood me, you would understand that I already explained that we don't know the exact type passed, that was entirely the point of binding early erasing the data types from further compile-time knowledge and you only have the trait bound remaining. That you didn't even get that point from what I wrote seems really bizarre.

keean commented 7 years ago

If you had understood me, you would understand that I already explained that you don't know the exact type passed, that was entirely the point. That you didn't even get that point from what I wrote seems really bizarre.

We always know the exact type passed, that is the whole point of parametric polymorphism. The only time we do not is if there is what Rust calls a trait-object or what Haskell calls an existential-type involved.

shelby3 commented 7 years ago

@keean wrote:

The only time we do not is if there is what Rust calls a trait-object

Bingo. Re-read my long comment again.

keean commented 7 years ago

So in a traditional system like Rust, if there is a trait object we can never monomorphise, it doesn't matter how early or late the type-class is bound.

shelby3 commented 7 years ago

@keean wrote:

once we bind an interface at compile-time _before_ the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.

This is exactly the opposite of what is true.

The word 'before' does not mean 'at'. It means 'before'. I am referring to subsuming to the trait bound before the call site, such as a trait object. But then I generalize it to partial trait objects and 2nd order dynamism.

shelby3 commented 7 years ago

@keean wrote:

So in a traditional system like Rust, if there is a trait object we can never monomorphise, it doesn't matter how early or late the type-class is bound.

In a system where I have generalized it to partial trait bounds and 2nd order dynamism then we still can't monomorphise but we gain the ability to add operations ex post facto.

Your reading comprehension my long comment was really not adequate. Was it really that badly explained or is it your mind is ossified and not prepared for a generalization?

keean commented 7 years ago

The word 'before' does not mean 'at'. It means 'before'. I am referring to subsuming to the trait bound before the call site, such as a trait object. But then I generalize it to partial trait objects and 2nd order dynamism.

But if you have dynamic polymorphism you cannot mono-morphise. It doesnt matter whether the interface is bound before or at the call site.

shelby3 commented 7 years ago

@keean wrote:

But if you have dynamic polymorphism you cannot mono-morphise. It doesnt matter whether the interface is bound before or at the call site.

Precisely. Early binding of interface requires dynamic polymorphism in order to open to adding new data types to the interface bound, e.g. for collections.

Also, you are only considering one of the tradeoffs. You forgot the other one which my long comments states and I have reiterated:

In a system where I have generalized it to partial trait bounds and 2nd order dynamism then we still can't monomorphise but we gain the ability to add operations ex post facto.

keean commented 7 years ago

But that is not what this sentence says:

So the lesson that once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.

This says we cannot mono-morphise if we bind the interface at compile time, when there is dynamic polymorphism. But we cannot mono-morphise at all with dynamic polymorphism, so the logic is wrong.

shelby3 commented 7 years ago

@keean wrote:

But that is not what this sentence says:

So the lesson that once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.

This says we cannot mono-morphise if we bind the interface at compile time, when there is dynamic polymorphism.

Correct.

But we cannot mono-morphise at all with dynamic polymorphism,

Correct.

so the logic is wrong.

What is wrong? You just didn't apparently understand that the goal is not to achieve monomorphism but rather to solve the Expression Problem and keep the extension open to new trait bounds even though we've partially bound to some typeclasses:

@shelby3 wrote:

Precisely. Early binding of interface requires dynamic polymorphism in order to open to adding new data types to the interface bound, e.g. for collections.

shelby3 commented 7 years ago

The point is that we always pass a data type along with a dictionary, but that dictionary can be incomplete and we don't track at compile-time what may already be in that dictionary. When we are ready to bind some trait bounds at a call site, then we write some code which will add these implementations to the dictionary at runtime (insuring they will be available as required for compile-time soundness), but will not overwrite any that already existed in the dictionary.

The point of this is that our caller can set preferential implementations (such as which direction of ordering for lessThan operation), yet the code that is called doesn't have to prematurely specialize on any trait bound signatures. Premature specialization is the antithesis of combinatorial composition as per @skaller's excellent point.

shelby3 commented 7 years ago

I think we aren't actually attaching the dictionaries to the data objects. Rather the data objects are tagged (instanceof) and the dictionaries are passed in as arguments to each function. Perhaps we should model it with a monad?

keean commented 7 years ago

When we are reading to bind some trait bounds at a call site, then we write some code which will add these implementations to the dictionary at runtime.

This is the same as asking if the implementations exist from the call-site. In order to add them, they must first exist in the source code, so the compiler can statically check they exist.

We can go further, we can ask, from the call site, get a list of all types for which there is an implementation of this function. Now we could put this in the type signature so that the linker knows statically what types it is safe to send to that function.

shelby3 commented 7 years ago

Wow you really aren't getting it.

When we are reading to bind some trait bounds at a call site, then we write some code which will add these implementations to the dictionary at runtime.

This is the same as asking if the implementations exist from the call-site.

Read again:

When we are ready to bind some trait bounds at a call site, then we write some code which will add these implementations to the dictionary at runtime (insuring they will be available as required for compile-time soundness), but will not overwrite any that already existed in the dictionary.

The point of this is that our caller can set preferential implementations (such as which direction of ordering for lessThan operation), yet the code that is called doesn't have to prematurely specialize on any trait bound signatures. Premature specialization is the antithesis of combinatorial composition as per @skaller's excellent point.

keean commented 7 years ago

There is still something odd about the way your description comes across to me, which might be my misunderstanding...

If I have a function that calls methods on something:

f(x) =>
    x.call_widget()

We can say with 100% certainty whatever type whether dynamic or static polymorphism that is passed to f must implement call_widget. Do you agree?

shelby3 commented 7 years ago

@keean wrote:

We can say with 100% certainty whatever type whether dynamic or static polymorphism that is passed to f must implement call_widget. Do you agree?

Agreed. And the mistake in your thinking is you are conflating not knowing if an preexisting implementation will not be replaced at runtime, with knowing at compile-time that all required implementations will be placed into the dictionary as required. But I am repeating myself.

I understand the concept is a bit confusing at first for someone not expecting it. I do tend to come of out no where with mind twisting new things.

keean commented 7 years ago

And the mistake in your thinking is you are conflating not knowing if an preexisting implementation will not be replaced at runtime, with knowing at compile-time that all required implementations will be placed into the dictionary as required. But I am repeating myself.

That is irrelevant, I am talking about at the call-site.

So when call_widget is called on x there must be an implementation for it available. Do you agree?

keean commented 7 years ago

So then when we have:

f(x) =>
    x.call_widget()

We can clearly see no code in f is inserting any implementations into anything. Sorry to have to take it this slowly, but I am feeling particularly stupid today. Are we agreed?

shelby3 commented 7 years ago

I thought you were inferring the trait bound on x. Yes you need a trait bound there. So what is your point?

keean commented 7 years ago

Sorry, I really need to take this one step at a time to see where our understanding diverges. I want to get to the bottom of this once and forall :-)

So can you agree we can be sure f is not dynamically changing the implementations available for `x whatever type it is?

keean commented 7 years ago

So we should agree my original statetment:

We can say with 100% certainty whatever type whether dynamic or static polymorphism that is passed to f must implement call_widget. Do you agree?

So there was no mistake in my thinking, based on the assumptions stated.

keean commented 7 years ago

Ah, but one of my assumptions could be wrong - you may want to have a method directly implemented on 'x' that does not depend on the type of x, but depends on the value. (so you may have been objecting to the 'type' implementing call_widget, when it might be the value?)

Of course then we are no longer talking about type-classes, but objects with virtual methods.

keean commented 7 years ago

Of course, if we cannot statically determine that x has an implementation of call_widget then we no longer have type-safety, and we are simply creating a dynamically-typed language like Python and JavaScript already are.

shelby3 commented 7 years ago

@keean the point you are missing is that some function g can call your f providing an implementation of the necessary typeclass bound for call_widget, but that does not preclude my point. My point is that some other function h which attaches an implementation to the said runtime dictionary for the typeclass for call_widget then calls g, but g does NOT know it at compile-time (only g will know at runtime). So g knows at compile-time that x is dynamically polymorphic and it knows the data type of x, so g writes some runtime code to attach the typeclass bound for call_widget and at runtime it will set this if there isn't already one attached at runtime. In other words, g supplies a default choice for the implementation, but the caller of g can override it. So that the caller could for example change the ordering of lessThan without g needing to specialize and know which typeclasses its caller wants to specialize. This is dynamic dependency injection specialization. It requires that typeclasses have invariant semantics and that only what can change between implementations is what is allowed by the typeclass semantics (which must be the case any way, otherwise typeclasses aren't generic).

keean commented 7 years ago

Okay, but the we can put the requirement that 'x' must provide an implementation in the signature of 'f', something like:

f(x : A) requires HasCallWidget
    x.call_widget().

Now if the constraint 'HasCallWidget' is on the type A then we have a type-class, if it is on the value x then we have subtyping.

shelby3 commented 7 years ago

One way to do this perhaps is every where we passed a data type into a function, we also pass a dictionary for that data type. We don't bind the dictionaries to the instances of the data type.

You might think this is useless, because you might think a function will know which trait bounds it needs and can allow its caller to specify them at the call site, but that requires the caller to always specify them, which becomes very onerous for composition, although it is more performant given it can be monomophized. And it doesn't allow defaults with overrides.

Afaics the above is not entirely open to extension, because a function for example might save the instances to a collection and later return that data to another caller. So alternative way to do it (and perhaps need both ways) is to bind the dictionaries to the instances of the data type, so then the overrides follow those instances until they are explicitly requested to be erased (overridden).

keean commented 7 years ago

One way to do this perhaps is every where we passed a data type into a function, we also pass a dictionary for that data type. We don't bind the dictionaries to the instances of the data type.

So we can have first-class dictionaries?

We already have that, a Record is a first class dictionary.

keean commented 7 years ago

One way to do this perhaps is every where we passed a data type into a function, we also pass a dictionary for that data type

Are you saying you want to pass a dictionary (by value) into a function?

shelby3 commented 7 years ago

@shelby3 wrote:

You might think this is useless, because you might think a function will know which trait bounds it needs and can allow its caller to specify them at the call site, but that requires the caller to always specify them, which becomes very onerous for composition, although it is more performant given it can be monomophized. And it doesn't allow defaults with overrides.

On further thought most often we want to write functions to be generic on type, so thus the function must require all typeclass bounds it requires and can't offer its own defaults. @skaller was operating on non-polymorphic data types such as Int.

So I am still not quite sure I understand what @skaller's issue is. With generics, we must pass in all the typeclass implementations required by the algorithm.

Edit: the issue might be when we want to be able to accept many different algorithms which have differing requirements for typeclasses, in which case we would need to keep the data types around. So in that case we also wouldn't know ahead of time which typeclass bounds we need. This is a deeper issue I need to think about after I sleep.

keean commented 7 years ago

@skaller was talking about ML style modules. With an ML style module you can pass the module you wist to use at runtime. So there can be two different implementations for a type.

So if we have Int we might have a module for sorting that does ascending order and one that does descending order. We cannot determine which to use based only on the type of Int.

So ML would pass a dictionary with the comparison operator you want as a parameter to the module.

keean commented 7 years ago

@shelby3 I think ML style modules are a different issue to the solving of the expression problem, and I think we are over complicating things by trying to discuss both at the same time.

Nobody has claimed ML style modules are a solution to the expression problem. If is more that @skaller does not like being limited to one implementation of an interface per type.

shelby3 commented 7 years ago

I don't understand why @skaller didn't do the following for his Felix language.

typeclass LessThan<A>
   lessThan(A): Bool

f(x: A, y: A) where A: LessThan => x < y // x.lessThan(y)

typeclass ForwardOrder<A> extends LessThan<A>
typeclass ReverseOrder<A> extends LessThan<A>
Int implements ForwardOrder
   lessThan(x) => this <= x
Int implements ReverseOrder
   lessThan(x) => this >= x

f(0, 1: ForwardOrder)
f(0, 1: ReverseOrder)

P.S. I think your idea of enforcing one implementation per data type is rubbish, because then I can't do the above.

keean commented 7 years ago

Int implements ForwardOrder
   lessThan(x) => this <= x
Int implements ReverseOrder
   lessThan(x) => this >= x

Because you can't do that with typeclasses.

lessThan can only be declared in one typeclass.

keean commented 7 years ago

typeclass ForwardOrder<A> extends LessThan<A>
typeclass ReverseOrder<A> extends LessThan<A>
Int implements ForwardOrder
   lessThan(x) => this <= x
Int implements ReverseOrder
   lessThan(x) => this >= x

Again not allowed because:

lessThan can only have one definition for Int

keean commented 7 years ago

This would be allowed:

data LessThan<A> = LessThan {
     lessThan(A, A) : Bool
}

let forwardOrder : LessThan<Int> = LessThan {
    lessThan(x, y) = x < y
}

let reverseOrder : LessThan<Int> = LessThan {
    lessThan(x, y) = x > y
}

f([0, 1], forwardOrder)
f([0, 1], reverseOrder)

Because the ordering is a value you can pass it in.

keean commented 7 years ago

A type class must only depend on the type of the arguments, so with a type class:

lessThan(x, y)

only the type of x and y can be used to determine which instance to run. If x and y are Int there can only be one implementation for lessThan on Int no matter how you structure things. This is because the only information available is the name of the function and the type of its arguments (and the type of its result), nothing else can be used.

keean commented 7 years ago

A datatype is first-class, this means it can be passed as an argument to functions like any other first-class object. So we can define:

data LessThan<A> = LessThan {
     lessThan(A, A) : Bool
}

f(x : A, y : A, order : LessThan<A>) => order.lessThan(x, y)

Then you can pass in a value of the 'record-type' LessThan, and it will use exactly that function to do the comparison. So a Record of functions is a first class dictionary.

And we can implement type-classes using records, as we are just adding a hidden 'implicit' argument for each type-class and selecting them based on the types at the call site.

keean commented 7 years ago

P.S. I think your idea of enforcing one implementation per data type is rubbish, because then I can't do the above.

This is not my idea, this is how type-classes work!

shelby3 commented 7 years ago

@keean wrote:

This is not my idea, this is how type-classes work!

Afaik I know that is a limitation of Haskell.

Feel free to explain why you think the syntax I wrote won't work.

What I see is I can select which implementation I want to supply to the trait bound if I so desire. I have no idea what reason you think otherwise.

keean / zenscript

Orthogonal concepts #14

Background

Matrix Conceptualization