stan-dev / stan2tfp

Stan2TFP is a work-in-progress alternative backend for Stanc3 which targets TensorFlow Probability
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Adding TFP for loops #9

Open adamhaber opened 4 years ago

adamhaber commented 4 years ago

Hi, I'm trying to add for loops functionality to the TFP backend. I started with adding:

| For x -> pf ppf "for %s in tf__.range(%a, %a):@,@[<v2>%a@]" x.loopvar pp_expr x.lower pp_expr x.upper pp_stmt x.body

to pp_stmt. However, I can't get indentation to work properly; The following model:

real x=0;
real yy = 9;
for (n in 1:N) {
    x = x+1;
    yy = yy+x;
    y[n] ~ normal(mu+yy, sigma);
}

is transpiled to:

  def log_prob_one_chain(self, params):
    target = 0

    # Data
    N = self.N
    y = self.y

    # Transformed data

    # Parameters
    mu = tf__.cast(params[0], tf__.float64)
    sigma = tf__.cast(params[1], tf__.float64)

    # Target log probability computation
    x = tf__.cast(0, tf__.float64)
    yy = tf__.cast(9, tf__.float64)
    for n in tf__.range(tf__.cast(1, tf__.float64), N):
    x = x + tf__.cast(1, tf__.float64)
      yy = yy + x
      target += tf__.reduce_sum(tfd__.Normal(mu + yy, sigma).log_prob(y[n]))
    return target

in which the first line (x = x + ...) is not properly indented. Another thing that's currently problematic in this implementation is the auto-casting of the loop variable to float - this is related to the last comment here. We could cast the whole tf.range to int but this seems weird.

I hope this is a reasonable place to start. There's an important question of how/if XLA will optimize/vectorize these "naive" for loops - I want to get this to work and then ask for feedback from the TFP team.

Last thing - I'm not sure I understand this comment - which functions is this referring to?

seantalts commented 4 years ago

I did a bunch of testing around optimizing loops - for loops were usually worse than vmap IIRC, though that was 6 months ago and XLA is a moving target.

You can check out how for loops in the C++ backend do indentation, https://github.com/stan-dev/stanc3/blob/master/src/stan_math_backend/Statement_gen.ml#L52-L54 It's a pain to work out from first principles. I'd recommend just copying that one at first and the modifying for your use. First basic issue is that you should start a box before "for" and then use something like pp_block, https://github.com/stan-dev/stanc3/blob/master/src/stan_math_backend/Statement_gen.ml#L7 to get an indented block of code.

seantalts commented 4 years ago

Another thing that's currently problematic in this implementation is the auto-casting of the loop variable to float - this is related to the last comment here. We could cast the whole tf.range to int but this seems weird.

Can we use the dtype argument to tf.range?

adamhaber commented 4 years ago

I did a bunch of testing around optimizing loops - for loops were usually worse than vmap IIRC, though that was 6 months ago and XLA is a moving target.

Probably a silly question, but isn't vmap specific for non-sequential computations? How can we vmap a for loop that accumulates something across iterations?

You can check out how for loops in the C++ backend do indentation,

Thanks! I've tried:

  | For {loopvar; lower; upper; body} -> 
    let pp_block ppf body = pf ppf "@;<1 2>@[<v>%a@]@," pp_stmt body in
    let pp_for_loop ppf (loopvar, lower, upper, body) =   pf ppf "@[<hov>for %s in tf__.range(%a, %a, dtype=tf__.int32):" loopvar pp_expr lower pp_expr upper;
      pf ppf " %a@]" pp_block body in
    pp_for_loop ppf (loopvar, lower, upper, body)

when I tried to sample from this model, I got:

OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

trying to sample with tf.function(autograph=False) didn't help.

I was able to solve this using:

 let pp_for_loop ppf (loopvar, lower, upper, body) =   pf ppf "@[<hov>for %s in range(%a, %a):" loopvar pp_expr_nc lower pp_expr_nc upper;

where pp_expr_nc is like pp_expr but doesn't cast to float; but this doesn't seem like a very good solution.

brianwa84 commented 4 years ago

XLA and TF will not autovectorize for loops for you. You might try tf.vectorized_map where it fits the bill, but detecting that statically could be challenging.

On Sat, Jun 27, 2020, 8:44 AM Adam Haber notifications@github.com wrote:

I did a bunch of testing around optimizing loops - for loops were usually worse than vmap IIRC, though that was 6 months ago and XLA is a moving target.

Probably a silly question, but isn't vmap specific for non-sequential computations? How can we vmap a for loop that accumulates something across iterations?

You can check out how for loops in the C++ backend do indentation,

Thanks! I've tried:

| For {loopvar; lower; upper; body} -> let pp_block ppf body = pf ppf "@;<1 2>@[%a@]@," pp_stmt body in let pp_for_loop ppf (loopvar, lower, upper, body) = pf ppf "@[for %s in tf.range(%a, %a, dtype=tf.int32):" loopvar pp_expr lower pp_expr upper; pf ppf " %a@]" pp_block body in pp_for_loop ppf (loopvar, lower, upper, body)

when I tried to sample from this model, I got:

OperatorNotAllowedInGraphError: iterating over tf.Tensor is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

trying to sample with tf.function(autograph=False) didn't help.

I was able to solve this using:

let pp_for_loop ppf (loopvar, lower, upper, body) = pf ppf "@[for %s in range(%a, %a):" loopvar pp_expr_nc lower pp_expr_nc upper;

where pp_expr_nc is like pp_expr but doesn't cast to float; but this doesn't seem like a very good solution.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stanc3/issues/594#issuecomment-650555920, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSIYNMRF2WE3J452T6LLRYXSRPANCNFSM4OJA7HRA .

adamhaber commented 4 years ago

What about tf.while_loop-based approaches, like this? AFAIU, tf.function does that automatically whenever possible, so I guess it won't be that simple...

adamhaber commented 4 years ago

Last thing - I'm not sure I understand this comment - which functions is this referring to?

If I understand correctly, this refers to the fact that tf.while_loop and tf.cond take Python callables as inputs (cond and body for tf.while_loop, true_fn/false_fn for tf.cond). Presumably, to get this to work we'll need to turn While (cond, body) to something like

loop_var = 
def cond(...): ...
def body(...): ...
tf__.while_loop(cond, body, loop_var)

For example, turning:

int n = 0;
while (n<N) {
    y[n] ~ normal(x[n] * beta + alpha, sigma);
    n = n+1;
    }

to

n = 0
def body(n, target): 
   target += tf__.reduce_sum(tfd__.Normal((x[n] * beta) + alpha, sigma).log_prob(y[n]))
   return (n+1, target)
def cond(n, target): return n<N
_, target = tf__.while_loop(cond, body, (n, target))

Does this makes sense? Not sure how this approach scales to anything more interesting than just incrementing target...

brianwa84 commented 4 years ago

That looks correct to me. But it will not be performant. What would be ideal would be to detect loops that are simply reductions and explicitly translate as such. Similarly, if you can detect loops where each iteration is independent of others those could translate to a vectorized version. If you want, we could look over some models with loops to see if there are other common patterns worth detecting for better performance.

There tldr is that control flow is expensive, especially on an accelerator like a GPU, so we usually strive to avoid it whenever possible. This includes both while_loop and cond.

On Sat, Jun 27, 2020, 4:37 PM Adam Haber notifications@github.com wrote:

Last thing - I'm not sure I understand this https://github.com/stan-dev/stanc3/blob/dc923f65ac88da19cd26e42a2b8e74e339dd05d8/src/tfp_backend/Code_gen.ml#L84 comment - which functions is this referring to?

If I understand correctly, this refers to the fact that tf.while_loop and tf.cond take Python callables as inputs (cond and body for tf.while_loop, true_fn/false_fn for tf.cond). Presumably, to get this to work we'll need to turn While (cond, body) to something like

loop_var = def cond(...): ... def body(...): ... tf__.while_loop(cond, body, loop_var)

For example, turning:

int n = 0;while (n<N) { y[n] ~ normal(x[n] * beta + alpha, sigma); n = n+1; }

to

n = 0def body(n, target): target += tf__.reduce_sum(tfd.Normal((x[n] * beta) + alpha, sigma).logprob(y[n])) return (n+1, target)def cond(n, target): return n<N, target = tf.while_loop(cond, body, (n, target))

Does this makes sense? Not sure how this approach scales to anything more interesting than just incrementing target...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stanc3/issues/594#issuecomment-650621880, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSIYZ3XYRNXZRQUVOYELRYZKBLANCNFSM4OJA7HRA .

adamhaber commented 4 years ago

If you want, we could look over some models with loops to see if there are other common patterns worth detecting for better performance.

Thanks, that'd be very helpful. ATM, my understanding is that something like independent iterations or simple assignments could be easily vmapped, while these aren't as simple:

I'm sure there are more/better examples.