microsoft / knossos-ksc

Compiler with automatic differentiation
Other
45 stars 10 forks source link

Polymorphic declarations #260

Open awf opened 4 years ago

awf commented 4 years ago

  We should sketch out a design for polymorphic functions and rules.

If we take the example of to_device, it should allow rewrites as follows:

    (to_device N (f x)) -> (f (to_device N x))

which requires to_device to have signature something like

    to_device :: (Int, ‘t) -> ‘t

and today that essentially means edefing it on all types it might see

    (edef to_device Float (Integer Float))
    (edef to_device String (Integer String))
    …
    (edef to_device (Tuple Float (Vec ...)) (Integer (Tuple Float (Vec ...))))

This might be fine, but it might be nicer to say something like

    (forall (Type 't)
        (edef to_device 't (Integer 't))

Of course this is somewhat all-or-nothing, I don’t have a way to say forall t in some subset of all types. But maybe it covers an important common case just well enough.. and if we're only doing this unconstrained polymorphism then we could just do it without introducing forall and Type, by letting an apostrophe introduce a new type variable....
Sort of ideologically ugly, but I like the practice of visually tagging type variables anyway...

    (edef to_device 't (Integer 't))

This would allow function declarations of e.g. build to live outside Prim.hs

    (edef build (Vec 't) (Integer (Lam Integer 't)))
    (edef size Integer (Vec 't))

and if we introduced Tangent as a type keyword, we could declare

    (edef ts_add 't ('t (Tangent 't)))

Note that this is not a proposal to introduce generics a la .NET, it's just a parameterized source of method declarations. Method lookup first looks up the function name, then unifies against the set of accepted argument types.

Polymorphic rules

I can imagine some form of polymorphic rule definition of the form

    (rule "to_device.sink" ((x : 't) (f : (Lam 't 's)) (N : Integer)) 
        (to_device N (f x)) 
        (f (to_device N x)))

This doesn't feel like it needs horrendous unification, but maybe there are cans of worms here?

ghost commented 4 years ago

If you use a generic type like 't you'll have to define its strict semantics. Does it mean any type? Any possible type? And how do you represent or even define possible?

If you mean any type, then you'll either have to implement every possible combination (in case the optimiser finds itself using those variants) or guarantee in the language and rewrite rules that it is impossible to reach those states. With dynamic rewrite rules, this becomes harder.

If the semantics is any possible or even any meaningful, then it's easier to implement, but harder to transfer the semantics to the user. Compiler errors will be weird things like "invalid token", "can't link object" etc because it won't be easy (or even possible) to provide a runtime check for all combinations of rules, implementations and their valid types.

In MLIR's declarative rewrites, the types can be grouped together (def scalar : AnyTypeOf<i64, f64>) and operations have their own supported types specified, too.

So when you define a rewrite on an operand, the rule restricts (ie validates) only to the types accepted in the implemented interfaces (and traits). That code is auto-generated, much like C++ templates, so only the valid alternatives are actually defined.

Knossos has implicit polymorphic operations (in prim.h) and so would only work on the types that those operations were declared (which is not the same for every one). However, the list of types implemented for each operation are not descriptive, but implemented, and you'll need some C++ template magic to check if the type is valid for a given op and a runtime assert if not. The distinction between scalars, tuples and vectors is also important, and some cases the function name is different (like dotv).

"to_device" is a special case, as it can operate on all types, but how it operates is different. For example, moving a constant weight to another device means create a new copy operation and change the "always live" rate of the "to" device, while moving an operation to another device means moving its code, execution and copy operands (which could be always live constants or outputs from other operations).

Implementation and costs will vary widely for different hardware. For example, moving a node to another thread on the same core will be a NOP and may not even invalidate L2 while moving to IPU will need explicit synchronous copies using Graphcore's proprietary API calls.

Other potential toolchain problems...

awf commented 4 years ago

I think there are two concepts here.

Template polymorphism

One is a bit like C++ templates; a macro language to lay down many function declarations in one. For example, the function

map :: (Lambda S T, Vec S) -> Vec T

cannot be declared today in KS, but with #266 could be declared for every overload used in a file, so the users would feel as if they had polymorphism

(edef map (Vec Float) ((f : Lambda Float Float) (v : Vec Float)) 
(edef map (Vec (Vec Float)) ((f : Lambda (Vec Float) (Vec Float)) (v : Vec (Vec Float)))
(edef map (Vec Float) ((f : Lambda (Vec Float) Float) (v : Vec (Vec Float)))
...
(edef map (Vec (Tuple String Float)) ((f : Lambda (Vec Float) (Tuple String Float)) (v : Vec (Vec Float)))

etc. For defs, require the bodies to be duplicated

(def map (Vec Float) ((f : Lambda Float Float) (v : Vec Float)
     (build (size v) (lam (i : Integer) (call f (index i v)))))

(def map (Vec (Vec Float)) ((f : Lambda (Vec Float) (Vec Float)) (v : Vec (Vec Float))
     (build (size v) (lam (i : Integer) (call f (index i v)))))

But we still are doing no more than saving bytes in the file, and memory in the parser. These might be useful goals some day, but given that C++ templates operate in a similar way, it will probably not be a real problem. The thought process is that if we were to emit 1000 versions of to_device, then rule matching would be slow. That would be naively true if rule matching were a linear implementation, but if rule matching were slow, one would use a finite state machine to make it fast, which is probably as fast as any other generic implementation.

Infinite templates

One could go further, and macro-generate the body

;;FOR 's in (Float, Vec Float, ..)
;;FOR 't in (Float, Vec Float, ..)
(def map (Vec 't) ((f : Lambda 's 't) (v : Vec 's))
     (build (size v) (lam (i : Integer) (call f (index i v)))))

Which, as is, is no better than the above. The only thing that makes it really different is to allow infinite sets of types. One option is to adapt the recursive type ideas in #247 . A simpler alternative is to just allow one infinite set: the set of all types. Yes, this means that the best type we can specify might be too wide, but this is just in the IR -- we can still narrow by using the function name, just as we would without any Ad hoc overloading (#276). So I can write

(edef add_int_int 't ((x : 't) (y : 't)))

and sadly the typing can't tell us that ''t should be restricted to ints.

Proper polymorphism with type variables

OK, three concepts, but a key advantage of an IR is to allow the parent language to do some high level stuff.

ghost commented 4 years ago

I'm a strong believer in good quality syntax sugar, but I never underestimate the cost of compiling large files nor type checking complexity, which even when carefully defined by a large committee over multiple decades can go wrong in so many ways (C++ or Java automatic type promotion, for example).

I think the 't syntax is a good idea. Like C++ templates, the only guarantees you have are the ones you make. So, if you want mixed types, you'd have to have different names, for example, 't and 's being two generic (not necessarily, but possibly) different types.

Mixing generic types with specific ones guarantee that you get your types correct. For example, mandating Integers for index variables while allowing the sub-type of the vector free for the user to choose.

However, like C++, templates only make sense as long as the implementation makes sense on those types. You will get a compiler error if you try to + objects that don't have that operator overloaded.

In Knossos, if you use (get) on a vector, or an (add) on a String, you should get a compiler error. The get clearly needs a Tuple, but the add could still be polymorphic.

Robust C++ libraries that use that composition also create specialisations for special types, and if we allow mixing concrete and generic types, we can do that too. Then the problem of matching goes from concrete to abstract implementations, for the same signature.

My point is: I think generic types like 't and 's, when mixed with concrete types and a consistent type checking mechanism, followed by specialisation and type matching can go a long way of making Knossos programs easier to write. But the compiler complexity will increase substantially and proportionately.

The alternative is to have weak types and let the original source files (C#, Julia, Haskell, Python, Tensorflow, ONNX) dictate what's possible and what's not. This becomes a serious problem, however, in making sure the transformations are valid and the output, when converted back to the source language, will still be semantically valid. A whole new can of worms. :)

I think it depends if we want Knossos to be mostly a conversion and search IR, or a proper language that people will write code in.

simonpj commented 4 years ago

I think it depends if we want Knossos to be mostly a conversion and search IR, or a proper language that people will write code in.

Good question! At the moment it is definitely the former. KSC is not intended to be a human-writable language, with all the syntactic sugar and attention to usability details that implies. No, our project is primarily about learning to optimise. That is a hard task in a simple, monomorphic language. If we try to make the language more complicated, I think we'll flounder.

Let's keep it simple! Separately, we can have complicated front ends that translate user-writable languages int KSC.

In a separate MR Tom has allowed us to wite (add a b) rather than add@ff a b, but that's a simple, local, almost notational change. It just means that the function called (among a fixed family of top level function definitions) is uniquely identified by its name and argument types. No polymorphism here, just notational abbreviation.

awf commented 3 years ago

@simonpj, we mentioned this in passing recently