Type issues arising due to combining datasets with eager

skeydan commented 6 years ago

Hi JJ,

picking up on the snippet we discussed ... I've extracted a small example we could experiment with.

As I see it, the difficulty is that all works the way you suggested when executed in the global env, but not (not in the same way, that is) when called via datasets. When called from datasets, tensors inside the load_image function are not eager, e.g., they don't have a $numpy() and seem to come across as environments (see experimental code lines below).

This here is related I think:

https://github.com/tensorflow/tensorflow/issues/14732

library(keras)
use_implementation("tensorflow")

library(tensorflow)
tfe_enable_eager_execution(device_policy = "silent")

library(tfdatasets)

system("wget https://blogs.rstudio.com/tensorflow/posts/2018-09-10-eager-style-transfer/images/style_epoch_1000.png")

# Without datasets --------------------------------------------------------

image <-  tf$read_file("style_epoch_1000.png")
image <- tf$image$decode_jpeg(image)
w <- k_shape(image)[2] 
w
# we don't need this
# w <- as.integer(w)

w2 <- as.integer(w / 2L)
# but here, we need the as.integer
# this does not work
# cannot compute StridedSlice as input #2(zero-based) was expected to be a int32 tensor but is a double tensor
# w2 <- w / 2L
w2
real_image <- image[ , 1L:w2, ]
input_image <- image[ , (w2 + 1L):w, ]

list(input_image, real_image)

# with datasets -----------------------------------------------------------

load_image <- function(image_file) {

  image <- tf$image$decode_jpeg(image_file)

  w <- k_shape(image)[2] 
  # can't use as.integer on this
  # cannot coerce type 'environment' to vector of type 'integer'.
  # w <- as.integer(w)

  # after division the TF type will be float64
  # so we need
  w2 <- (w / 2L) %>% k_cast(tf$int32)
  # w2 <- as.integer(w / 2L)
  # w2 <- w / 2L
  real_image <- image[ , 1L:w2, ]
  input_image <- image[ , (w2 + 1L):w, ]

  list(input_image, real_image)

}

train_dataset <-
  tf$data$Dataset$list_files("style_epoch_1000.png") %>%
  dataset_map(function(x) load_image(x)) %>%
  dataset_batch(1)

jjallaire commented 6 years ago

What about using tfe.py_func() as suggested in the TF issue? https://www.tensorflow.org/api_docs/python/tf/contrib/eager/py_func

skeydan commented 6 years ago

Thanks ... I need to go one step back (too few session restarts last night).

First, without datasets, this is the correct code resulting in no warnings, and having both w and w2 being R integers instead of eager tensors. (Lesson for me being: Even if it's possible to get things done with eager tensors now, it's still a sensible thing to convert to R types).

# Without datasets --------------------------------------------------------

image <-  tf$read_file("style_epoch_1000.png")
image <- tf$image$decode_png(image)
w <- as.integer(k_shape(image)[2])
w

w2 <- as.integer(w / 2L)
w2
real_image <- image[ , 1L:w2, ]
input_image <- image[ , (w2 + 1L):w, ]

list(input_image, real_image)

skeydan commented 6 years ago

Now for py_func...

I've been using tf$py_func here

https://github.com/rstudio/keras/blob/4d2cbc497f06c1c286b7b4545cd6a15f5d67fe01/vignettes/examples/eager_image_captioning.R

already because in that case, I didn't see another way (whereas in the current case, it is about getting rid of the warning, or rather, the cause thereof).

Now, I am testing both tf$contrib$eager$py_func and tf$py_func in the below code.

For the true use case, I need to take a boolean argument and then, condition on it, so now the test code has been adapted for that.

While the contrib version errors with

     [[{{node EagerPyFunc}} = EagerPyFunc[Tin=[DT_STRING, DT_BOOL], Tout=[DT_UINT8, DT_UINT8], token="pyfunc_2"](arg0, Const)]]
Error: UnknownError: RuntimeError: Evaluation error: argument is not interpretable as logical.

while with py_func it works, and I get rid of the warning.

The comparison made in

https://www.tensorflow.org/api_docs/python/tf/contrib/eager/py_func

tf.contrib.eager.py_func is similar in spirit to tf.py_func, but unlike the latter, the former lets you use TensorFlow operations in the wrapped Python function. In particular, while tf.py_func only runs on CPUs and wraps functions that take NumPy arrays as inputs and return NumPy arrays as outputs, tf.contrib.eager.py_func can be placed on GPUs and wraps functions that take Tensors as inputs, execute TensorFlow operations in their bodies, and return Tensors as outputs.

does not seem totally up-to-date and/or correct (e.g., I can use it on GPU). I think I would stay with tf$py_func. Also it's probable that all eager related functionality will move out of contrib I guess...

Then in principle, I think this should be a candidate for a wrapper? If so, I'd say we might create it around the non-contrib function...

# with datasets -----------------------------------------------------------

load_image <- function(image_file, is_train) {

  image <- tf$read_file(image_file)
  image <- tf$image$decode_png(image)

  w <- as.integer(k_shape(image)[2])
  w2 <- as.integer(w / 2L)
  if(is_train) {
    real_image <- image[ , 1L:w2, ]
    input_image <- image[ , (w2 + 1L):w, ]
  } else {
    real_image <- image[ , 1L:w2, ]
    input_image <- image[ , (w2 + 1L):w, ]
  }

  list(input_image, real_image)

}

train_dataset <-
  tf$data$Dataset$list_files("style_epoch_1000.png") %>%
  dataset_map(function(image)
    tf$py_func(load_image, list(image, tf$constant(TRUE, dtype = tf$bool)), list(tf$uint8, tf$uint8))) %>%
    #tf$contrib$eager$py_func(load_image, list(image, tf$constant(TRUE, dtype = tf$bool)), list(tf$uint8, tf$uint8))) %>%
  dataset_batch(1)

iter <- make_iterator_one_shot(train_dataset)

until_out_of_range({
  batch <- iterator_get_next(iter)
  input_image <- batch[[1]]
  target <- batch[[2]]
  print(input_image$shape)
  print(target$shape)
})

jjallaire commented 6 years ago

Yeah, we should definitely have a wrapper for py_func. How about tf_r_func() ?

My one question though about py_func is whether it's going to ever run the function on a background thread (if so we need to make some special provisions from R)

skeydan commented 6 years ago

From the docs page I think one can't know...

Don't know if it's easily deducible from

tensorflow/python/lib/core/py_func.cc

?

For the name, yeah sounds good, I was also thinking we might rename tfe_enable_eager_execution to tf_enable_eager_execution some time (just came to mind because of the prefix)

skeydan commented 6 years ago

Not that I think it helps so much, but this is what I get from top -H when running with tf$py_func on cpu:

Threads: 1026 total,   3 running, 940 sleeping,   0 stopped,   0 zombie
%Cpu(s): 55.8 us,  3.5 sy,  0.0 ni, 40.3 id,  0.0 wa,  0.3 hi,  0.1 si,  0.0 st
KiB Mem : 32720172 total,  8191680 free,  6888660 used, 17639832 buff/cache
KiB Swap: 16429052 total, 16429052 free,        0 used. 25296416 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                         
21365 key       20   0 7593152 3.172g 119756 S 53.4 10.2   0:17.67 rsession                                                                                                                                        
21367 key       20   0 7593152 3.172g 119756 S 53.1 10.2   0:17.63 rsession                                                                                                                                        
21368 key       20   0 7593152 3.172g 119756 S 53.1 10.2   0:17.60 rsession                                                                                                                                        
21370 key       20   0 7593152 3.172g 119756 S 53.1 10.2   0:17.55 rsession                                                                                                                                        
21364 key       20   0 7593152 3.171g 119756 S 52.8 10.2   0:17.60 rsession                                                                                                                                        
21369 key       20   0 7593152 3.172g 119756 S 52.8 10.2   0:17.47 rsession                                                                                                                                        
21371 key       20   0 7593152 3.172g 119756 S 52.5 10.2   0:17.59 rsession                                                                                                                                        
21366 key       20   0 7593152 3.172g 119756 S 52.1 10.2   0:17.57 rsession                                                                                                                                        
21208 key       20   0 7593152 3.171g 119756 R 46.6 10.2   0:21.61 rsession                                                                                                                                        
18857 key       20   0 2233584 236800 127040 S  3.3  0.7   0:24.06 Web Content                                                                                                                                     
21244 key       20   0 2842448 152252  65476 S  1.6  0.5   0:02.87 QtWebEngineProc                                                                                                                                 
21173 key       20   0  258300   5300   3580 R  1.3  0.0   0:00.75 top                                                                                                                                             
18874 key       20   0 2233584 236800 127040 S  1.0  0.7   0:03.30 Timer                                                                                                                                           
 2720 root      20   0  425416 187844  75364 S  0.3  0.6  29:35.90 Xorg                                                                                                                                            
 3998 key       20   0  841344  51204  28380 S  0.3  0.2   1:09.76 gnome-terminal-                                                                                                                                 
21175 key       20   0 3232392 230152 145212 S  0.3  0.7   0:00.62 rstudio                                                                                                                                         
31791 key       20   0 2549904 406768 132708 S  0.3  1.2  10:56.49 Web Content

rstudio / tfdatasets

Type issues arising due to combining datasets with eager #12