Open adamhaber opened 4 years ago
I did a bunch of testing around optimizing loops - for loops were usually worse than vmap IIRC, though that was 6 months ago and XLA is a moving target.
You can check out how for loops in the C++ backend do indentation, https://github.com/stan-dev/stanc3/blob/master/src/stan_math_backend/Statement_gen.ml#L52-L54 It's a pain to work out from first principles. I'd recommend just copying that one at first and the modifying for your use. First basic issue is that you should start a box before "for" and then use something like pp_block, https://github.com/stan-dev/stanc3/blob/master/src/stan_math_backend/Statement_gen.ml#L7 to get an indented block of code.
Another thing that's currently problematic in this implementation is the auto-casting of the loop variable to float - this is related to the last comment here. We could cast the whole tf.range to int but this seems weird.
Can we use the dtype
argument to tf.range
?
I did a bunch of testing around optimizing loops - for loops were usually worse than vmap IIRC, though that was 6 months ago and XLA is a moving target.
Probably a silly question, but isn't vmap specific for non-sequential computations? How can we vmap a for loop that accumulates something across iterations?
You can check out how for loops in the C++ backend do indentation,
Thanks! I've tried:
| For {loopvar; lower; upper; body} ->
let pp_block ppf body = pf ppf "@;<1 2>@[<v>%a@]@," pp_stmt body in
let pp_for_loop ppf (loopvar, lower, upper, body) = pf ppf "@[<hov>for %s in tf__.range(%a, %a, dtype=tf__.int32):" loopvar pp_expr lower pp_expr upper;
pf ppf " %a@]" pp_block body in
pp_for_loop ppf (loopvar, lower, upper, body)
when I tried to sample from this model, I got:
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
trying to sample with tf.function(autograph=False) didn't help.
I was able to solve this using:
let pp_for_loop ppf (loopvar, lower, upper, body) = pf ppf "@[<hov>for %s in range(%a, %a):" loopvar pp_expr_nc lower pp_expr_nc upper;
where pp_expr_nc
is like pp_expr
but doesn't cast to float; but this doesn't seem like a very good solution.
XLA and TF will not autovectorize for loops for you. You might try tf.vectorized_map where it fits the bill, but detecting that statically could be challenging.
On Sat, Jun 27, 2020, 8:44 AM Adam Haber notifications@github.com wrote:
I did a bunch of testing around optimizing loops - for loops were usually worse than vmap IIRC, though that was 6 months ago and XLA is a moving target.
Probably a silly question, but isn't vmap specific for non-sequential computations? How can we vmap a for loop that accumulates something across iterations?
You can check out how for loops in the C++ backend do indentation,
Thanks! I've tried:
| For {loopvar; lower; upper; body} -> let pp_block ppf body = pf ppf "@;<1 2>@[
%a@]@," pp_stmt body in let pp_for_loop ppf (loopvar, lower, upper, body) = pf ppf "@[ for %s in tf.range(%a, %a, dtype=tf.int32):" loopvar pp_expr lower pp_expr upper; pf ppf " %a@]" pp_block body in pp_for_loop ppf (loopvar, lower, upper, body) when I tried to sample from this model, I got:
OperatorNotAllowedInGraphError: iterating over
tf.Tensor
is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.trying to sample with tf.function(autograph=False) didn't help.
I was able to solve this using:
let pp_for_loop ppf (loopvar, lower, upper, body) = pf ppf "@[
for %s in range(%a, %a):" loopvar pp_expr_nc lower pp_expr_nc upper; where pp_expr_nc is like pp_expr but doesn't cast to float; but this doesn't seem like a very good solution.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stanc3/issues/594#issuecomment-650555920, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSIYNMRF2WE3J452T6LLRYXSRPANCNFSM4OJA7HRA .
Last thing - I'm not sure I understand this comment - which functions is this referring to?
If I understand correctly, this refers to the fact that tf.while_loop
and tf.cond
take Python callables as inputs (cond
and body
for tf.while_loop
, true_fn
/false_fn
for tf.cond
). Presumably, to get this to work we'll need to turn While (cond, body)
to something like
loop_var =
def cond(...): ...
def body(...): ...
tf__.while_loop(cond, body, loop_var)
For example, turning:
int n = 0;
while (n<N) {
y[n] ~ normal(x[n] * beta + alpha, sigma);
n = n+1;
}
to
n = 0
def body(n, target):
target += tf__.reduce_sum(tfd__.Normal((x[n] * beta) + alpha, sigma).log_prob(y[n]))
return (n+1, target)
def cond(n, target): return n<N
_, target = tf__.while_loop(cond, body, (n, target))
Does this makes sense? Not sure how this approach scales to anything more interesting than just incrementing target...
That looks correct to me. But it will not be performant. What would be ideal would be to detect loops that are simply reductions and explicitly translate as such. Similarly, if you can detect loops where each iteration is independent of others those could translate to a vectorized version. If you want, we could look over some models with loops to see if there are other common patterns worth detecting for better performance.
There tldr is that control flow is expensive, especially on an accelerator like a GPU, so we usually strive to avoid it whenever possible. This includes both while_loop and cond.
On Sat, Jun 27, 2020, 4:37 PM Adam Haber notifications@github.com wrote:
Last thing - I'm not sure I understand this https://github.com/stan-dev/stanc3/blob/dc923f65ac88da19cd26e42a2b8e74e339dd05d8/src/tfp_backend/Code_gen.ml#L84 comment - which functions is this referring to?
If I understand correctly, this refers to the fact that tf.while_loop and tf.cond take Python callables as inputs (cond and body for tf.while_loop, true_fn/false_fn for tf.cond). Presumably, to get this to work we'll need to turn While (cond, body) to something like
loop_var = def cond(...): ... def body(...): ... tf__.while_loop(cond, body, loop_var)
For example, turning:
int n = 0;while (n<N) { y[n] ~ normal(x[n] * beta + alpha, sigma); n = n+1; }
to
n = 0def body(n, target): target += tf__.reduce_sum(tfd.Normal((x[n] * beta) + alpha, sigma).logprob(y[n])) return (n+1, target)def cond(n, target): return n<N, target = tf.while_loop(cond, body, (n, target))
Does this makes sense? Not sure how this approach scales to anything more interesting than just incrementing target...
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stanc3/issues/594#issuecomment-650621880, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSIYZ3XYRNXZRQUVOYELRYZKBLANCNFSM4OJA7HRA .
If you want, we could look over some models with loops to see if there are other common patterns worth detecting for better performance.
Thanks, that'd be very helpful. ATM, my understanding is that something like independent iterations or simple assignments could be easily vmapped, while these aren't as simple:
I'm sure there are more/better examples.
Hi, I'm trying to add for loops functionality to the TFP backend. I started with adding:
to
pp_stmt
. However, I can't get indentation to work properly; The following model:is transpiled to:
in which the first line (
x = x + ...
) is not properly indented. Another thing that's currently problematic in this implementation is the auto-casting of the loop variable to float - this is related to the last comment here. We could cast the whole tf.range to int but this seems weird.I hope this is a reasonable place to start. There's an important question of how/if XLA will optimize/vectorize these "naive" for loops - I want to get this to work and then ask for feedback from the TFP team.
Last thing - I'm not sure I understand this comment - which functions is this referring to?