Open awf opened 4 years ago
Current definition of shape:
sizeof(Type)
in C++.The idea behind encoding sizeof(Type)
in the shape is that we may be able to evaluate bytes
given only the shape of an object, without even knowing its original type, ie. (bytes e) = (bytesFromShape (shape e))
.
But I don't think it's clear yet whether we'll be able to maintain this property. Maybe we'll end up needing different versions of bytesFromShape
depending on the type of e
.
One slight complication: sizeof(tuple<Ts...>)
is not necessarily equal to sum(sizeof(Ts)...)
, because of padding: indeed the precise value may depend on the C++ compiler/library implementation. If we want to avoid a dependency on knowledge of the C++ compiler we will have to make quite conservative assumptions about padding, which may mean overallocating slightly.
How would we define the shape of a linear map?
Two difficulties:
X -> Y
as a 2-tuple, (shape of X, shape of Y)
. But then it would definitely no longer be the case that (bytes e)
depends only on (shape e)
, as the shape of a LM would look like the shape of a tuple (which has a different size in bytes).bytes
of a LM, you would need to know the CType
, not just the Type
.(For the moment I'm ducking this issue by not defining shape$
for functions which involve linear maps.)
Yep, ducking correct at the moment.
PR #411 contains the current specification of shape$ functions.
Changed from the original plan: the shape of a Float/Integer/Bool/String is now (tuple)
.
So, there are some important differences here compered to the DPS paper.
The DPS paper makes two assumptions which the version here does not make:
Vec T
, all the T
s must have the same shape.Reasons for not wanting to make these assumptions:
filter
which uses a condition based on runtime values.Vec (Vec T)
, the fact that we have chosen not to use a Tensor implies that the object can be jagged.But dropping these assumptions has some awkward consequences:
Vec T
must itself be a Vec
(at least when T
is not a scalar type), and in general will involve a build
.build
, but also because if the shape depends on a runtime condition, we may need to evaluate the entire function up to the point where that condition is applied.)If we're using this for DPS, we seem to have a problem if evaluating a shape may allocate memory. If the idea is to evaluate shape$f
before we evaluate f
, won't we also need to evaluate shape$shape$f
, shape$shape$shape$f
, and so on? With our current definition of shape, this sequence doesn't terminate! (Maybe instead we should define the shape of a Vec T
to be its size when T
is scalar: then the sequence would eventually terminate in a scalar.) A complication when thinking about this is that we don't actually intend to call shape$f
as such: instead we're planning to inline it into bytes$f
(to find out how much memory to allocate), which does return a scalar. But I don't think the problem goes away here: we're just pushing it down into the inlined expression. If there is a build
involved somewhere, we're going to have to fall back to a non-DPS implementation at some point or we'll end up with an infinite recursion. (Edited: actually, no, perhaps not infinite. But I think it may involve an arbitrarily long recursion, with one level for each build
in the expression that you are evaluating?)
Use of constVec
may help to some extent. In the usual case where the elements of a vector do all have the same shape (and in particular for Vec Scalar
), the shape will be a constVec
. We can potentially change Cgen so that constVec
objects are treated specially and do not require memory from the allocator.
Even with special treatment for constVec, it seems there will always be examples where we don't want to use DPS. Ultimately, RLO can decide whether DPS is to be used. But for the moment, we'll presumably need some heuristic for when Cgen will use DPS. How about, DPS will be used iff evaluating the shape of the expression does not involve allocating memory? (This is something that Cgen can easily determine, and we can build in the assumption that a constVec does not involve allocation.)
We can certainly say we won't use DPS to call shape$ -- just use the allocator. We can release all the memory after we compute size/cost, even if we didn't inline the shape call.
So I think we can have a whole bunch of heuristics about when to use DPS -- maybe even did the shape function optimize to trivial? Etc etc.
Oh, and yes, jagged tensors are an important case (think of sentences going through BERT).
We can certainly say we won't use DPS to call shape$ -- just use the allocator.
Oh, yes, I'd missed that! It's pointless using DPS to call shape$: we don't care where the result is stored as we're going to throw it away anyway!
I think the question I should have been asking is, if the implementation of shape$f
involves calling a function f2
, then should that call use DPS? I expect the answer should be, we ignore the fact that we're implementing a shape function here, and just decide whether to use DPS in the same way that we do for ordinary functions.
But if we always use DPS, including in the implementation of shape functions, then in theory we could end up with an exponential increase in runtime: in the worst case, shape$f2
is approximately as expensive as f2
, so a DPS call to f
involves
shape$f
, which calls shape$f2
and then f2
; thenf
, which again calls shape$f2
and then f2
,ie. about four times as expensive as our current implementation. And further if f2
calls f3
which calls f4
and so on, in the worst case there will be 2^n
calls to fn
or shape$fn
.
We're hoping that shape$fm
depends only on shape$f{m+1}
and not on f{m+1}
itself, but because we're allowing shapes to depend on runtime values, that can't be guaranteed.
I think this is why I'm a bit scared to do without any heuristic for when to use DPS. (The other part is, since we're generating shape$f
by repeatedly inlining any calls to shape$f2
and so on, there's a similar potential for exponential increase in code size.)
One option is to use DPS rarely, perhaps even under a directive like $inline?
So just to note: we currently have jagged matrices in GMM:
(def mul$R$VecR (Vec Float) ((r : Float) (a : Vec Float))
(build (size a) (lam (i : Integer) (mul r (index i a)))))
(def mul$R$VecVecR (Vec (Vec Float)) ((r : Float) (a : Vec (Vec Float)))
(build (size a) (lam (i : Integer) (mul$R$VecR r (index i a)))))
So the correct shape functions for these are:
(def shape$mul$R$VecR (Vec (Tuple)) ((r : Float) (a : Vec Float))
(constVec (size a) (tuple)))
(def shape$mul$R$VecVecR (Vec (Vec (Tuple))) ((r : Float) (a : Vec (Vec Float)))
(build (size a) (lam (i : Integer) (constVec (size (index i a)) (tuple)))))
The latter does indeed build a vector of sizes, and if it were used to e.g. prepare a 100x100 tensor of zeros, it would incur an additional 100x1 memory cost to do so. This will of course go away for non-jagged tensors after #274, but it's worth noticing that for large jagged tensors, it's often a fairly negligible cost.
And the corresponding size functions will be
(def size$mul$R$VecR (Vec (Tuple)) ((r : Float) (a : Vec Float))
(mul (size a) (sizeof$Float)))
(def size$mul$R$VecVecR (Vec (Vec (Tuple))) ((r : Float) (a : Vec (Vec Float)))
(sumbuild (size a) (lam (i : Integer) (mul (size (index i a)) (sizeof$Float)))))
which of course avoid the allocation.
See link at #364
[ ] Implement shape pass: all fs -> shape$fs
[ ] Implement DPS...
Or, doing shapes. Principles:
(def foos (Vec (Vec (Tuple String Float))) () (build 7 (i -> (build i (j -> (tuple "fred" (to_float (+ i j)))))))
(def shape$foos (Vec (Vec (Tuple Integer))) () (shape (build 7 (i -> (build i (j -> (tuple "fred" (to_float (+ i j))))))))
(rule ((a : Vec 't)) (shape a) (build (size a) (lam (i : Integer) (shape (index i a))))) (rule ((body : 't)) (shape (build e (lam ((i : Integer)) body))) (build e (lam (i : Integer) (shape body))))
(def bytes$foo T (s : S) (bytes (foo s))) ;; has no effect on heap, as returns scalar
(bytes (build n (i -> b))) -> (sumbuild n (i -> (bytes b))) (bytes g xs) -> (bytes$g xs) (sumbuild n (i -> const)) -> (mul n const) ;; already in