rstudio / keras3

R Interface to Keras
https://keras3.posit.co/
Other
831 stars 283 forks source link

Subclassing API #655

Closed kevinykuo closed 5 years ago

kevinykuo commented 5 years ago

Currently, keras_model_custom() doesn't seem to fully support the model subclassing API, in the sense that it only accommodates a forward pass function. Therefore, in order to implement something like https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/r2/tutorials/generative/cvae.ipynb#scrollTo=VGLbvBEmjK0a, we'd have to define multiple top level functions each with a call to keras_mode_custom() (e.g. https://blogs.rstudio.com/tensorflow/posts/2019-01-24-vq-vae/).

Could we consider exposing an API that would allow the user to encapsulate methods in the model object? E.g. something like

# signature of the new function (or we could use keras_model_custom and try to maintain compatibility)
keras_model_bespoke <- function(..., init = NULL, name = NULL) {...}

model <- keras_model_bespoke(
  init = function(self) {
    self$dense1 <- layer_dense(units = 32, activation = "relu")
    self$dense2 <- layer_dense(units = 10, activation = "softmax")
  },
  call = function(inputs, training = FALSE) {
    x <- self$dense1(inputs)
    self$dense2(x)
  },
  some_other_fun = ...
)

keras_model_bespoke() would return a model object, and defining a constructor would be similar to the current keras_model_custom() API, e.g.

custom_model_constructor <- function(...) {
  # ...
  keras_model_bespoke(...)
}
model <- custom_model_constructor()
skeydan commented 5 years ago

Hi Kevin,

we were talking about that with @jjallaire when I wrote the post ... Basically the way I did in in R was a workaround, but still, in this specific case it was not problematic because (1) in Python too, backprop flows independently through both models and (2), having 2 models in R was a good fit to my way of framing models as "agents"/"actors".

In the meantime, I've found that at least with TF 2, you can in fact have additional methods in an object. You can even use a custom model like so, not calling (implicit) call but another method, from outside:

ac2_model <- function(name = NULL,
                      num_actions) {

  keras_model_custom(name = name, function(self) {

   # some fields

    self$action_value <- function(obs) {
      # executes call() under the hood
      c(logits, value) %<-% self$predict(obs)
      action <- self$dist$predict(logits)
      list(tf$squeeze(action, axis = -1L), tf$squeeze(value, axis = -1L))
    }

    function (x, mask = NULL) {
      # do some stuff
    }
  })
}

c(action, value) %<-% model$action_value(k_expand_dims(obs, axis = 1))

However, even though this works, there will be limits to what you can do, as opposed to the Python way that directly relies on object inheritance - the question is do we need to mimick that Python way... ?

One thing that is important in this respect: With TF 2, the usage of custom models will - I think - become less frequent (again!). Until then, whenever I use eager, which for a lot of reasons I want to do, I have to use custom models combined with GradientTape backprop (what on the Python side, they call the "imperative way"). However with TF 2, we will be able to use eager with the usual keras compile, fit etc. (in tf.keras of course). That means that in many cases, where custom actions aren't required, we will want to revert to the "symbolic", or declarative, way of doing things (see also https://medium.com/tensorflow/what-are-symbolic-and-imperative-apis-in-tensorflow-2-0-dfccecb01021).

So if for design reasons, it is best to leave the custom model as is, perhaps we could cut down on complexity by living with this difference to Python and trying to make use of the declarative way more often, again (as soon as possible, i.e. with TF 2).

@jjallaire @javierluraschi what do you think?

kevinykuo commented 5 years ago

The way you outline in the snippet does indeed work, but feels kind of (subjectively) unnatural since we're using the side effect of attribute initialization to define methods. Also, in eager mode, if I understand correctly, a custom model need not have a designated forward pass at all. Although, currently it might not complain if you don't provide the function at the end, so one way to go about this is to document accordingly and say "you can do whatever you want to the object via `$<-`(self,,)."

Having a proper subclassing API would clean up code for more advanced use cases that deal with sharing attributes or layers among methods. If we're worried about having too many functions for custom models, we can probably put it in keras_model_custom() and maintain backwards compatibility.

jjallaire commented 5 years ago

I think we could do what @kevinykuo is proposing but just get rid of the init() function (i.e. return a list of functions including call, init is just whatever runs prior to the list of functions being defined/returned). This would be compatible with the existing API as we could just check whether a list or function is returned.

kevinykuo commented 5 years ago

@jjallaire you mean something like this?

model <- keras_model_custom(
  {
    self$dense1 <- layer_dense(units = 32, activation = "relu")
    self$dense2 <- layer_dense(units = 10, activation = "softmax")
  },
  call = function(inputs, training = FALSE) {
    x <- self$dense1(inputs)
    self$dense2(x)
  },
  some_other_fun = ...
)
skeydan commented 5 years ago

Simple test case if we can have a custom model that has 2 Keras sequential models as members, AND access all weights as trainable_variables (btw: the model has no call function!):

library(keras)
use_implementation("tensorflow")

library(tensorflow)
tfe_enable_eager_execution()

encoder_model <- function(name = NULL) {

  keras_model_custom(name = name, function(self) {

    self$m1 <- keras_model_sequential() %>%
      layer_dense(units = 2)
    self$m2 <- keras_model_sequential() %>%
      layer_dense(units = 2)

    self$do1 <- function(x) {
      self$m1(x)
    }

    self$do2 <- function(x) {
      self$m2(x)
    }
  })
}

e <- encoder_model()
c(e$m1, e$m2)

(res1 <- e$do1(k_constant(matrix(1:10, ncol = 2))))

e$trainable_weights
[[1]]
<tf.Variable 'sequential/dense/kernel:0' shape=(2, 2) dtype=float32, numpy=
array([[ 0.3070637 ,  0.03883982],
       [-1.0978718 , -0.33549595]], dtype=float32)>

[[2]]
<tf.Variable 'sequential/dense/bias:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)

Yay! So we already have all (I think) the required functionality and it's only a matter of adapting the API ... if we want to? What exactly do we want to change, given you can do as in the above code snippet? I

jjallaire commented 5 years ago

It would be more like this:

model <- keras_model_custom(units1, units2) {

  self$dense1 <- layer_dense(units = units1, activation = "relu")
  self$dense2 <- layer_dense(units = units2, activation = "softmax")

  list(
    call = function(inputs, training = FALSE) {
      x <- self$dense1(inputs)
      self$dense2(x)
    },
    some_other_fun = ...
  )
}

You don't need a special "init" function, it's just what executes in the closure before returning the list of functions.

So it's essentially the same as the current API but you can optionally return a list of named functions rather than a single function that is mapped to call()

skeydan commented 5 years ago

Yeah, I was just thinking from the user's point of view, I can already call these functions, like I did above:

e$do1(k_constant(matrix(1:10, ncol = 2)))

... ?

jjallaire commented 5 years ago

I didn't get the syntax quite right (being too quick). Here's the simple example from the docs re-written:

library(keras)

keras_model_simple_mlp <- function(num_classes, 
                                   use_bn = FALSE, use_dp = FALSE, 
                                   name = NULL) {

  # define and return a custom model
  keras_model_custom(name = name, function(self) {

    # create layers we'll need for the call (this code executes once)
    self$dense1 <- layer_dense(units = 32, activation = "relu")
    self$dense2 <- layer_dense(units = num_classes, activation = "softmax")
    if (use_dp)
      self$dp <- layer_dropout(rate = 0.5)
    if (use_bn)
      self$bn <- layer_batch_normalization(axis = -1)

    # methods
    list(
      call = function(inputs, mask = NULL) {
        x <- self$dense1(inputs)
        if (use_dp)
          x <- self$dp(x)
        if (use_bn)
          x <- self$bn(x)
        self$dense2(x)
      },
      some_other_func = function(...) {
         # do stuff
      }
  })
}
jjallaire commented 5 years ago

I think though that we don't need this unless there are specific Keras model superclass methods that we need to override. If we just want to add functions to the object then assigning the functions as members of self should work fine.

kevinykuo commented 5 years ago

OK seems like everything we want to do right now is supported currently so let's put this on hold. I'd prefer to have more delineation between defining attributes vs methods (as in the python API or R6) but it's largely cosmetic. One thing that we may need to work with like JJ mentioned is e.g. overriding compute_output_shape() (https://www.tensorflow.org/guide/keras#model_subclassing), but we haven't hit that yet so we can cross that bridge when we get there.