Type classes - Githubissues

jckarter commented 12 years ago

Design and implement a type class feature to formalize the definition and implementation of protocols in a more formally sound and potentially more performant way than the current predication-based system. Goals:

improve clarity
improve overload semantics—type class hierarchies provide a natural linearization order for overloads, which predicates don't
improve compiler performance
support a wide range of generic programming techniques. ideally type classes should be the tool you reach for 90% of the time, with ad-hoc overloads and predicates available as a backup
maintain the dynamic flexibility of Clay code—ideally it should still be possible to partially or informally implement interfaces for prototyping or exploratory programming

jckarter commented 12 years ago

Some discussion already occurring in #375

ghost commented 11 years ago

A possible solution may be to use boolean operators e.g.

[T and U:Foo, U:Bar, C:Foo or Bar]
define C as Corduroy[T, U] {
    Plaid?(x:C, y:T, z:U) : Bool;
}

stepancheg commented 11 years ago

Are you considering no type classes at all? For me two ways to express the same thing is confusing: programmer can choose both:

define Sequence?(S) = ...

[S when Sequence?(S)]
serialize(s: S) { ... }

and

define Sequence[T] as { ... }

[T]
serialize(s: Sequence[T]) { ... }

Too many ways to do the same thing is a bad thing.

Error reporting

I've read the motivation part of Protocols proposal, it is not convincing. It tells that for predicate:

ForwardCoordinate?(T) = CallDefined?(dereference, T) and CallDefined?(inc, T);

While this works, it leaves the compiler with very little semantic information for error reporting or debugging—it can only report that the pattern match failed, not why.

This seems to be a compiler implementation issue, not a language limitation. Compiler can report that:

Predicate ForwardCoordinate?(Int) failed because CallDefined?(dereference, Int) is false.

or even

Predicate ForwardCoordinate?(Int) failed because dereference function is not defined for Int parameter.

Performance

Compilation with type classes can be faster than predicate evaluation. Although this is true, this is not a real problem. Real problem is lack of incremental compilation, that is painful for large projects, and type classes won't help much with this problem.

Overload linearization

type class hierarchies provide a natural linearization order for overloads

Probably it is not better than current "later wins" rule.

I am satisfied with current type system and overload resolution design. Type classes will make Clay a very different language, and I doubt this new language is better.

My 2 cents.

ghost commented 11 years ago

@stepancheg my comment above is a continuation of the discussion in #375. It refers to a possible type-class syntax i.e. T and U are members of Foo, U is member of Bar, C is member of Foo or Bar. Where C is the pattern for the Corduroy type-class etc.

The discussion started when I found the predicate evaluator to be quite slow in comparison to overload lookup & pattern matching, hence the performance issue. No doubt the evaluator implementation could be significantly improved, however @jckarter suggested taking another look at type-classes as a possible feature.

jckarter commented 11 years ago

@stepancheg Error resolution in that fashion isn't practical. For more complex predicates, how do you decide which nodes of the call graph to show? You can't report every false or true evaluation between the primitives and the top level expression; that would be too much information.

The current overload resolution order is untenable across modules. It's impossible to predict which modules get loaded first, and imports can potentially have subtle effects on seemingly unrelated code. Type class based overloading would allow a no-ambiguous-matches rule to be practical, because refinement could happen through type class refinement rather than delicate ordering of overloads.

Whilew it's undesirable to have multiple ways to do something, building protocols piecemeal out of CallDefined queries is awkward, and it's easy to get wrong in the face of lvalue/rvalue overloads or ref/value returns. You should of course be able to query class membership from a predicate and use a predicate as a type class constraint, so that both features are interoperable and the best tool for the job can be used.

jckarter commented 11 years ago

@agemogolk @stepancheg It is true that a subset of predicate expressions, such as comparing types or using CallDefined?, could be mapped to constraints and evaluated by constraint solving. Perhaps instead of type classes, the evaluator could attempt to transform an expression to a constraint problem and fall back to linear interpretation if it cannot.

Another way to improve clarity of protocols would possibly be to have a form that tests the validity of an expression, similar to D's is(typeof(expr)) idiom, returning true if the expression is compilable and false otherwise. For example:

[Seq, Iter]
Sequence?(#S, #Iter) = Can?((s:Seq) : Iter -> { return iterator(s); });

stepancheg commented 11 years ago

@jckarter

Error resolution in that fashion isn't practical. For more complex predicates, how do you decide which nodes of the call graph to show?

Probably in the most situations predicates are simple function calls or simple "and" expressions, thus they can be nicely printed. If expression is complex, compiler would print simple message as it does now.

The current overload resolution order is untenable across modules. It's impossible to predict which modules get loaded first, and imports can potentially have subtle effects on seemingly unrelated code

This is a price for flexibility. However, this problem can be reduced with three measures:

Prohibit cyclic module imports by the compiler
Avoid reusing symbol for different operations (for instance, I think stdlib may split push operation into push element and pushAll sequence). This cannot be enforced by the compiler and requires discipline from library designers
Implement open/final overloads. With final overloads order of import doesn't matter, because conflicting overloads result in compilation failure no matter which module imported last

Note that C++ also has a problem of overload resolution order: project may have overloads in different header files. Compiled code depends on which header files are #include'd in particular .cpp file. For instance operator << to stream can be implemented differently in different parts of project. Moreover, include order is also important, because in C++ functions only see declarations declared above in the current compilation unit. But nobody complains, because developers don't usually abuse this property.

Another way to improve clarity of protocols would possibly be to have a form that tests the validity of an expression, similar to D's is(typeof(expr)) idiom

I'm not familiar with D, could you please explain the idiom or share a link?

ghost commented 11 years ago

@stepancheg I think @jckarter is referring to D template constraints. Actually, looking at this is(...) just seems to be a static assertion, how is this different from the predicate based system we currently use?

jckarter commented 11 years ago

@stepancheg @agemogolk In D, typeof(expr) is the inferred type of expr. is expressions evaluate to false if there is a substitution or inference failure, so is(typeof()) can be used to test whether a sequence of operations is valid. For example, is(typeof(foo(a, b))) would be true if foo is defined for the types of a and b. Clay could benefit from having a similar feature as an alternative to CallDefined?.

ghost commented 11 years ago

OK, I see. Yes, that would be a nice addition.

jckarter commented 11 years ago

@stepancheg

Prohibit cyclic module imports by the compiler

This is appealing, but cyclic imports are occasionally useful, because they allow you to categorize public interfaces into modules in a way that makes logical sense without being constrained by implementation details.

Avoid reusing symbol for different operations (for instance, I think stdlib may split push operation into push element and pushAll sequence). This cannot be enforced by the compiler and requires discipline from library designers

I agree. There's too much over-overloading of symbols in the library, and it would be good for families of operations like push-element vs. push-sequence and insert-element vs. insert-range vs. insert-sequence to be broken up into distinct symbols.

Implement open/final overloads. With final overloads order of import doesn't matter, because conflicting overloads result in compilation failure no matter which module imported last

This would be a big help. Ideally overloads should be final by default, with refinable overloads limited to where they make sense. While it's true that C++ includes can have overload-related side effects, the potential for an extra operator<< to have unintended consequences is reduced by the fact that C++ has well-defined (albeit hard-to-understand) linearization rules for overloads and disallows ambiguous overloads that score the same on the linearization scale.

jckarter commented 11 years ago

@agemogolk It would be invalid for multiple overloads to match the same call site. In your example, foo(x) would be invalid because it matches both overloads, whereas foo(x, y) would only match the second and would be OK. You would need to use predicates or more specific patterns to explicitly separate the common case (which you probably intended to do anyway, since currently your foo(x) overload will just silently never get used under the current rules).

stepancheg commented 11 years ago

@jckarter About cyclic imports. Because of public imports, cycles can always be broken without affecting API.

There's a trivial mechanical algorithm to break cycle. Suppose you have two modules: aaa and bbb, they need to import some stuff from each other.

create a module aaabbb.internal
move all code from both aaa and bbb into a single aaabbb.internal module
add public imports from aaabbb.internal to aaa and to bbb to declaration previously lived in these modules
cycle is now broken, and old declarations from aaa and bbb are still available by their old names

Of course, it would be better to do more sophisticated job (for example, what I did in 961314e470fc7568aad1516363903dca9480845a), it is just a proof that cycle break is always possible. I want to emphasize that prohibition of cycles add some trouble to library developers, but does not affect library users (or, in another words, public API) at all.

jckarter commented 11 years ago

@stepancheg that is true, though it is convenient that an additional module is currently unnecessary. There are however other benefits to disallowing cyclic imports; it would for instance be nice to be able to support import predicates (such as [when Windows?] import foo.windows;), which would rely on a well-defined ordering of loads and evaluations.

ghost commented 11 years ago

I'm not shure if I like type classes. but that's mainly because i can't imagine how well things play together. specifically type classes, predicates, records, tuples and other statics in combination. possibly you can imagine full examples for me?

say, I want to write a labyrinth game with walls, keys and stuff. say, I want the development most intuitive so that I could possibly write things like:

[O where isWall?(O)] overload isMovable?(O) = false;

(...)

if ( isMovable?(obj) ) { moveOn(obj, directionOf(p1)); moveOn(p1, directionOf(p1)); } else beep();

silly example, I know! the point is that I'd like to have an interface very native to the toy I am programming. Below that interface, the situation is very different. Because the labyrint is a grid and all graphics share the same size and basic logics, there could be only one record named gameObject (or so) that can represent all the different objects. basically, it holds a pointer to a set of different pixmap objects for different animations, if any, an ID field to identify the type of that object (wall, key...) and some state. besides there are some generic interfaces like above, which are all the same for all objects i.e. isDeadly?, isFetchable?, destroy, moveOn, triggerAnim, whatever...

currently I'd have a problem to distinct the objects if the record is the same. records are also not clone'able or subtype'able. there are no predicates for checking aliases and so forth. type classes now seem to solve the issue. but how do they if the only suitable difference for a check is the ID in the record? how would I subtype the objects to make them distinct though their memory layout is the same? got the point?

can someone please show me how type classes play in here and how i should lay out the code?

galchinsky commented 11 years ago

You are able to do everything with predicates that you could do with typeclasses. I think the problem is that you have mixed compile-time and run-time wrong way. If you write [M when isMovable?(M)] moveOn(obj : M, ...) you don't need if ( isMovable?(obj) ) condition and runtime beeping, the compiler will check everything for you. If you use ID field to distinct records, you can't use any compile time feaures, because ID is a runtime value. Try to use variants instead of godObject + ID fields. Then the labyrinth would be a grid of variants.

ghost commented 11 years ago

@galchinsky, thanks for the correction. I'd have tried to keep tuples around like context objects in C. Stupid habits...

however, the issue that drove me to this example is still unsharp to me. you write You are able to do everything with predicates that you could do with typeclasses. I agree, and I find predicates far more natural in clay. but the idea of implementing type classes is to deprecate the use of predicates, mainly for compilation issues (that I understand and agree on.) I'm not so shure if type classes help the programmer much, though. they rather seem to draw up a very excessive parallel world, only for contracts. while other parts of the language are kept the C'ish way. I'm not shure if i'd find type classes rather in the way and leave them aside. that's why i tried to confront mainly jckarter with a quasi-real-life example of combined language-element use. shurely, I can't see how real code will look like and would like to see an example code of something small that's ought to be compile'able and doing something useful.

jckarter / clay

Type classes #382

Error reporting

Performance

Overload linearization