microsoft / knossos-ksc

Compiler with automatic differentiation
Other
45 stars 10 forks source link

The Great Renaming (includes Vec -> Tensor) #274

Open awf opened 4 years ago

awf commented 4 years ago

Our names are pretty nasty at the moment. "build" is not special, it's just a way to construct an object of type Tensor.
We should fix these when we get a good window.

Here's a list of suggested rewrites. Some are polymorphic so currently need to be in Prim.hs, others should move to prelude.ks

build -> Tensor.init    ; `.` is allowed in identifiers, so this is not special, just a convention
rand -> (Tensor.rand  sizes)
to_float -> Integer.init  ; Integer.init might also be overloaded from bool
get$m$n -> get$m   ; No need for $n, as names need not be unique
etc..

Other renamings: ks types in C++ to ks::Float, ks::Vec, or, uglier but safer, to ks_Float, ks_Vec to prepare for C backend. Note that or->ks_or and abs->ks_abs are better than ks::or etc because of reserved words and clashes with global namespace names like abs.

ghost commented 4 years ago

I'd like to request a syntax change for get. Having the $m doesn't add anything special to it and complicates parsing, as we have to make get a special case, with the semantics in the name, rather than operands.

Can we just have:

(get m tuple)

instead?

toelli-msft commented 4 years ago

What behaviour should it have when m is out of range?

ghost commented 4 years ago

This is an orthogonal issue, and trivially fixed.

Where m comes from can be determined at compile time even as an operand. If m is a literal, then the case is identical to get$m.

If not, then it has the same behaviour as other runtime offset errors. If there are no runtime offset errors, then we should just force m to be a literal and issue a compiler error if it's not.

ghost commented 4 years ago

For comparison, (index m vector) has the same problem. If it is legal to pass dynamic vectors to functions that will use (index), then we must have runtime offset checks and/or errors, or forbid dynamic vectors in the first place and restrict (build) to only allow literals in its first operand.

ghost commented 4 years ago

Note: my MLIR implementation of (get) does a compile-time check to make sure $m is in range, as I only allow statically declared tuples. It's a compile error to have an out of bounds $m.

toelli-msft commented 4 years ago

Runtime error is certainly a valid behaviour for an out-of-range index. This happens when indexing vectors, as you point out. On the other hand, avoiding runtime errors by forcing the index to be a literal seems to remove one special case but add another. We don't have other constructions where an argument must be a literal integer, as far as I recall. (Currently the function argument to build must be a literal lambda, but we are aiming to remove that restriction.)

If the index argument were not required to be a literal integer then what should the typechecking rule be?

Perhaps you could say more about how MLIR represents indexing into tuples so that we can understand more about the incompatibility.

ghost commented 4 years ago

Get is a special case in that its name requires special parsing. Having an argument that needs to be literal is just a type restriction which is a lot less special. Plenty of languages have such kind of restriction.

There are other examples in Knossos where the shape of parameters (ex. the lambda in fold, build) needs to follow a special rule, or where the operand type defines if it's implemented polymorphically in the library, headers or explicit declarations.

In MLIR, index is a special integral type of the size of a native scalar register. Accessing elements in composite structures can only be done via the index type, which means you have an explicit conversion between whatever type you had before (int, float, long, double) into the native type, with all the restrictions that that imposes (truncation, extension, conversion, wrap semantics, etc). It does not require the index to be literal, but it also doesn't define strict tuple semantics either.

The problem with tuples is that it depends how you interpret it. If tuples are opaque types with an unknown list of (maybe nested) elements, then an accessor needs runtime checks. On the other side of the spectrum, tuples are uniqued by their types, and all checks happen at compile time.

Polymorphism brings a dynamic component and may require additional compile checks on their own. Runtime composition (like Scala) may be even more complicated. I don't think this is true for Knossos.

For example, in C++, tuples (like pair) is an anonymous collection of types to ease argument marshalling, and is therefore, usually well defined at compile time. I assume the same is true with Knossos, so the only problem is the indexing.

If the index is a compile time constant (literal) or const-expr (derived exclusively from literals), then the bounds check is purely static. If the index is dynamic, the check is simple, as the tuple shape is static, and therefore, the runtime check is also simple (1 <= m <= N).

All in all, replacing a special case in the parser with a special case in the formal definition of the language (especially one that is quite common) is usually a good thing, as it simplifies the code, makes assumptions more explicit and allows for more flexible extensions to the language.

awf commented 4 years ago

The real crux is not the out-of-range check, it's the static type.

One essentially needs dependent types, or types in counting in order to declare get, while ad-hoc overloading and polymorphism (#260) is enough to allow us to declare get$m.

With those, we can declare

(edef get_0 't0 (Tuple 't0))
(edef get_0 't0 (Tuple 't0 't1))
(edef get_0 't0 (Tuple 't0 't1 't2))
(edef get_0 't0 (Tuple 't0 't1 't2 't3))
(edef get_0 't0 (Tuple 't0 't1 't2 't3 't4))
...
(edef get_1 't1 (Tuple 't0 't1))
(edef get_1 't1 (Tuple 't0 't1 't2))
(edef get_1 't1 (Tuple 't0 't1 't2 't3))
(edef get_1 't1 (Tuple 't0 't1 't2 't3 't4))
...
(edef get_2 't2 (Tuple 't0 't1 't2))
(edef get_2 't2 (Tuple 't0 't1 't2 't3))
(edef get_2 't2 (Tuple 't0 't1 't2 't3 't4))
...
(edef get_3 't3 (Tuple 't0 't1 't2 't3))
(edef get_3 't3 (Tuple 't0 't1 't2 't3 't4))
...
(edef get_4 't4 (Tuple 't0 't1 't2 't3 't4))

Etc, and lots of early C++ implementations of tuples did pretty much that. Modern C++ implementations are still a little tricky, see https://codereview.stackexchange.com/questions/44546/very-basic-tuple-implementation

The reasoning behind the ugly get$m is essentially to automate the above declarations, and the ugly dollar is to hint to the reader that they should look somewhere other than prelude.ks in order to find the definition.

Now, we could easily add special-case code to our parsers etc to parse and typecheck (get m t), but to me that's a brand new concept in the IR, whereas get$m is just a shorthand for the (n*(n-1)/2) declarations that we would need to handle n-tuples.

ghost commented 4 years ago

The complexity of C++ tuple implementations is that it is a standard library container, not a language feature. It has to be implemented using language features (like variadic templates) and that's not trivial.

What looks to me is that (tuple) is a core concept of Knossos, but (get) is not. We have a native type that can't be natively accessed, and that is why we have this issue.

In C-like languages, a closer concept would be a struct. Even anonymous ones have names for the fields, and the access is defined in the language as obj.field, which is enough to collapse all complexity into a simple compile-time evaluation.

Though, bear in mind, I don't think we should change the representation solely based on this one issue. It's a small issue and it if will create a big chain reaction, by all means, let parsers be slightly more complicated. My own parser is not even that more complicated, so it's perfectly fine the way it is.

I'm only writing long replies because I like to be thorough in my explanations, so that we can make sure it makes sense in the language. :)

awf commented 4 years ago

What looks to me is that (tuple) is a core concept of Knossos, but (get) is not. We have a native type that can't be natively accessed, and that is why we have this issue.

This is compelling, and the tradeoff is between:

I'm leaning toward the latter, but am going to award myself a weekend to let it simmer.

I'm only writing long replies because I like to be thorough in my explanations, so that we can make sure it makes sense in the language. :)

And it's very much appreciated :)

toelli-msft commented 4 years ago

What looks to me is that (tuple) is a core concept of Knossos, but (get) is not. We have a native type that can't be natively accessed, and that is why we have this issue.

I don't understand this. In what sense is the former core but the latter not?

awf commented 4 years ago

What looks to me is that (tuple) is a core concept of Knossos, but (get) is not. We have a native type that can't be natively accessed, and that is why we have this issue.

I don't understand this. In what sense is the former core but the latter not?

I guess in the sense that Tuple is a built-in type so needs syntax to construct it. We have a special function tuple in many parsers, but I guess we could declare that too, e.g. calling it Tuple.init under the new consistent naming convention above.

(edef Tuple.init (Tuple) ())
(edef Tuple.init (Tuple 't0) ('t0))
(edef Tuple.init (Tuple 't0 't1) ('t0 't1))
...

And fall back conceptually to the definitions of get_N above.

The thing that might tip our thinking slightly toward "tuple is core" is our use of one-arg in AD, but of course that is not visible in the surface syntax we're talking about here.

toelli-msft commented 4 years ago

I see. tuple is a keyword and (tuple ...) is special syntax, whereas we want get$m to be a "normal" function (even though at the moment it has to be a primitive due to lack of polymorphism). If we wanted to increase consistency in this regard we would have to have a "core" pattern matching syntax for tuples. Then gets could be normal functions written in terms of pattern matching. I don't know if that improves anything vis-a-vis MLIR though.

ghost commented 4 years ago

This has gone far from the syntax of (get), which was the original intent, but I'll continue the tangent for now. Feel free to ignore if I go too far. :)

Knossos doesn't have named types, so it's hard to compare. But in C-like languages, it would be the difference between struct and struct foo { int a; }. The former is a core concept of the language that cannot be operated via normal functions and needs another core concept . to access named elements. The latter can very well be used as arguments and return values from functions and operators and access to its members can be composed by the core accessor ..

The type (tuple) (not the constructor function) is a core concept, but the accessor wants to be a regular function. But without a core accessor concept, to create a truly generic accessor, one needs variadic generics, which is what @awf has written above.

Languages are usually constructed in layers. Core concepts should only depend on other core concepts and be consistent in that way. If a core concept depends on a higher-level construct, then not only you need to support more complex features to allow simpler ones to exist, but you will also have a harder time proving consistency. It's a lot easier to prove the core language, and use that to prove the higher features, than put all in one bundle and try to reason about it later.

Even if we don't want to prove anything, it still makes it harder to develop more complex features in the language, if your core language is ill defined. Exhibit A: Java Generics. The behaviour of the core language depends a lot on the implementation of the standard library (class Object) and the JVM, which makes for horrendous side-effects.

ghost commented 4 years ago

About the original request, (get N tuple), I personally think it should be a keyword, defined as:

(get <literal> tuple<?>)

with static out-of-bounds checks and compile-time defined tuple types, including ones directly from Tuple.init.

My preference is based on two things:

  1. The core concept explained above, for language consistency
  2. Parsing keywords is simpler, no need for variadic generics
toelli-msft commented 4 years ago

I'm aware of three distinct possibilities:

  1. get$m, the status quo. Each is essentially a special form and requires a special case in the parser.
  2. get, a special form that takes a literal integer as its first "argument". This is a special case as we don't have any other such constructions.
  3. A pattern matching special form, such as

    (let-tuple ((a1, a2, ..., an) t) body)

I guess all are feasible. @rengolin-ms, if you really want to make the case for 2 I guess Simon should be involved.

awf commented 4 years ago

3a. We can, without much trouble, just allow existing let to pattern match:

(let ((a1 a2 a3 ... an) t) body)

And it would make some of the tupling/detupling in our AD look better.

ghost commented 4 years ago

I don't really mind either way. It was just a suggestion.

toelli-msft commented 4 years ago

Yes, actually 3a is better. It avoids any new syntax.

dcrc2 commented 4 years ago

Are we happy with the name unzip for the function which converts a vector of tuples to a tuple of vectors?

ksc uses the name unzip but in other discussions we've called this function vot2tov, which may be better as it is more specific.

On the other hand, if vector -> tensor, then tot2tot doesn't really work!

ghost commented 4 years ago

As long as Knossos only has one vector type, then zip and unzip are clear.

If/when Knossos implements more list types, like tensor, it could "get away" with keeping an overloaded zip/unzip terminology with generics and type inference. Or it could use the existing name mangling to have unzip@tv and unzip@tt.

But it's unclear to me what adding tensor would accomplish at the Knossos level, that vector doesn't do.

awf commented 4 years ago

Ah yes, tot2tot does rather scupper things. So, we have

zip :: Tuple (Tensor 'T1, ..., Tensor 'Tn) -> Tensor (Tuple 'T1, ..., 'Tn)
unzip :: Tensor (Tuple 'T1, ..., 'Tn) -> Tuple (Tensor 'T1, ..., Tensor 'Tn)

Or maybe not -- with overloading it can distinguish the two cases, and it removes the "is it zip or unzip" question you sometimes get, e.g. "how do I unzip in python?" Answer: "use zip"

tot2tot :: Tuple (Tensor 'T1, ..., Tensor 'Tn) -> Tensor (Tuple 'T1, ..., 'Tn)
tot2tot :: Tensor (Tuple 'T1, ..., 'Tn) -> Tuple (Tensor 'T1, ..., Tensor 'Tn)
awf commented 4 years ago

Adding to this: we have an ad-hoc syntax at the moment for "derived names", for example the grad or size of a function. This might occur in the definition of map:

map :: (s -> t, Vec s) -> Vec t
map f ss = build (length ss) (i -> (f (index i ss)))

rev$map :: (s -> t, Vec s) -> Vec dt -> ((), Vec ds)
-- f :: s->t
-- rev$f :: s -> dt -> ds
rev$map (f, ss, dts) = ((), build (length ss) (\ i . 
   let s = index i ss in 
   let dt = index i dts in
   rev$f s dt -- Need to refer to rev$f -- could just make it a syntactic oddness like get$1...
 )

Should we rename these syntaxes, e.g.

map<rev>
map{rev}
get{1}
get<1>

Let's see the above like that:

map :: (s -> t, Vec s) -> Vec t
map f ss = build (length ss) (i -> (f (index i ss)))

map{rev} :: (s -> t, Vec s) -> Vec dt -> ((), Vec ds)
-- f :: s->t
-- f{rev} :: s -> dt -> ds
map{rev} (f, ss, dts) = ((), build (length ss) (\ i . 
   let s = index i ss in 
   let dt = index i dts in
   f{rev} s dt 
 )
awf commented 3 years ago

And... gmm:exp$VecR

awf commented 3 years ago

But it's unclear to me what adding tensor would accomplish at the Knossos level, that vector doesn't do.

It's mainly to distinguish jagged tensors. So a Vec (Vec Float) might be jagged, a Tensor Float is rectangular. We notice that the latter loses dimensionality information, but that will be restored by shape work #411

dcrc2 commented 3 years ago

Questions about the Vec -> Tensor renaming, assuming that we want to go ahead with this:

awf commented 3 years ago
  • Should the number of dimensions be part of the Tensor's type? (e.g. Tensor 2 Float rather than just Tensor Float).

Yes, let's start with Tensor <N> Type, which maps closely to Vec (Vec T). The multi-dimensional build doesn't need the number:

(Tensor.init (tuple 3 5 7) (lam (ijk : Tuple Integer Integer Integer)
   (let ((i j k) ijk)  (Float.init (add i (add j k)))))

And the definition of Tensor.init is, in principle:

(edef Tensor.init (Tensor 2 't)   ((Tuple Integer Integer) (Lambda (Tuple Integer Integer) 't)))
(edef Tensor.init (Tensor 3 't)   ((Tuple Integer Integer Integer) (Lambda (Tuple Integer Integer Integer) 't)))
(edef Tensor.init (Tensor 4 't)   ((Tuple Integer Integer Integer Integer) (Lambda (Tuple Integer Integer Integer Integer) 't)))
awf commented 3 years ago
  • How are Tensor elements laid out? (Does each dimension have a corresponding stride?)

I would say compactly for the moment, deferring all layout to being libtorch compatible.
See e.g. the .NET libtorch wrapper at https://github.com/xamarin/TorchSharp