rstudio / tensorflow.rstudio.com

https://tensorflow.rstudio.com
9 stars 12 forks source link

Bug in the R- Keras implementation for the RELU activation function in a simple network ? Cannot be, can it ? #64

Closed alucas69 closed 1 year ago

alucas69 commented 1 year ago

The following seems to be a bug in behaviour of Keras-R. It is an embryonic version of the problem I am studying. Running the following code gives a graph that becomes both positive and negative. That should be impossible given the RELU activation function and given a zero initial bias.

ibrary(tensorflow)
library(keras)

x = as_tensor(-5:5, dtype = tf$float32)
model <- keras_model_sequential(input_shape = c(1, 1)) %>%
  layer_flatten() %>%
  layer_dense(1, activation = "relu") %>%
  layer_dense(1)
plot(x,predict(model, x), type = "l", col = "black")

If you want, add an argument , weights = list(matrix(1,1,1), as.array(0, dim = 1)) to the dense layers to ensure an identity pass-through.

The problem is absent in the Python - Keras interface.

The problem is also absent if I use GELU or SELU or other activation functions rather then RELU: in that case the above code results in the expected activation function pattern. The problem thus seems particular to the (most important) RELU activation specification.

I also know how to make the problem go away inside R - Keras by implementing work-arounds such as:

#### works as planned ####
model <- keras_model_sequential(input_shape = c(1, 1)) %>%
  layer_flatten() %>%
  layer_dense(1, activation = "relu")
lines(x,predict(model, x)-0.2, type = "l", col = "red")

#### works as planned ####
model <- keras_model_sequential(input_shape = c(1, 1)) %>%
  layer_flatten() %>%
  layer_activation_relu() %>%
  layer_dense(1)
lines(x,predict(model, x)+0.2, type = "l", col = "blue")

It is however very uncomfortable to set-up a network not knowing for sure whether the network in the end is what you intend it to be.

Does anyone have a clue? Given the importance of the software and so many people working with it, it seems it should be me making the mistake rather than the software ...

t-kalinowski commented 1 year ago

Hi, thanks for posting. The behaviour you're observing, I think, is stemming from how layer_dense() is initialized, in particular, if it is initialized to a negative or positive value. For example:

library(tensorflow)
library(keras)

x <- as_tensor(-5:5, dtype = tf$float32, shape = c(-1, 1))

kernal_initializer <- function(shp, dtype) {
  tf$fill(shp, tf$cast(kernal_initial_value, dtype))
}

# initialize dense layer with 1
kernal_initial_value <- 1
model <- keras_model_sequential(input_shape = c(1)) %>%
  layer_dense(1, activation = "relu", use_bias = FALSE, 
              kernel_initializer = kernal_initializer)
plot(x, predict(model, x), type = "l", col = "black", asp = 1)
#> 1/1 - 0s - 34ms/epoch - 34ms/step


# initialize dense layer with -1
kernal_initial_value <- -1
model <- keras_model_sequential(input_shape = c(1)) %>%
  layer_dense(1, activation = "relu", use_bias = FALSE, 
              kernel_initializer = kernal_initializer)
plot(x, predict(model, x), type = "l", col = "black", asp = 1)
#> 1/1 - 0s - 17ms/epoch - 17ms/step


# initialize dense layer with 1, move relu out in a separate layer
model <- keras_model_sequential(input_shape = c(1)) %>%
  layer_dense(1, activation = NULL, use_bias = FALSE, kernel_initializer = tf$ones) %>% 
  layer_activation_relu()
plot(x, predict(model, x), type = "l", col = "black")
#> 1/1 - 0s - 19ms/epoch - 19ms/step


# initialize dense layer with -1, no relu
kernal_initial_value <- -1
model <- keras_model_sequential(input_shape = c(1)) %>%
  layer_dense(1, activation = NULL, use_bias = FALSE, 
              kernel_initializer = kernal_initializer)
plot(x, predict(model, x), type = "l", col = "black")
#> 1/1 - 0s - 18ms/epoch - 18ms/step


# just relu
model <- keras_model_sequential(input_shape = c(1)) %>%
  layer_activation_relu()
plot(x, predict(model, x), type = "l", col = "black")
#> 1/1 - 0s - 20ms/epoch - 20ms/step

Created on 2023-10-03 with reprex v2.0.2

alucas69 commented 1 year ago

Thank you for your answer, t-kalinowski. I am afraid it does not solve the issue.

The initialization of the weights / kernel is indeed important for the steepness of the RELU, but does not (I think) how my example yields a function that is both positive and negative. It should not happen with RELU.

Indeed, we also found that if we initialize the kernel and set the bias evaluation to FALSE, the RELU is recovered. However, in the plain vanilla implementation of the example the outcome (a linear curve that is both positive and negative) is completely incompatible with the RELU operation.

As mentioned, this defect does not emerge with any of the other activation functions, but is particular to the RELU, which makes it even more suspicious to me.

t-kalinowski commented 1 year ago

Can you please paste the output from plot, just to confirm we're both seeing the same thing?

This is what I see when I run your code (lightly modified to initialize weights + bias to 1):

library(tensorflow)
library(keras)

x <- as_tensor(-5:5, dtype = tf$float32)

layer_dense_1 <- function(...) 
  layer_dense(..., 
              kernel_initializer = tf$ones,
              bias_initializer = tf$ones)

model <- keras_model_sequential(input_shape = c(1, 1)) %>%
  layer_flatten() %>%
  layer_dense_1(1, activation = "relu") %>%
  layer_dense_1(1)
plot(x,predict(model, x), type = "l", col = "black",
     xlim = c(-6, 6), ylim = c(-6, 6))
#> 1/1 - 0s - 38ms/epoch - 38ms/step

model <- keras_model_sequential(input_shape = c(1, 1)) %>%
  layer_flatten() %>%
  layer_dense_1(1, activation = "relu")
lines(x,predict(model, x)-0.2, type = "l", col = "red")
#> 1/1 - 0s - 20ms/epoch - 20ms/step

model <- keras_model_sequential(input_shape = c(1, 1)) %>%
  layer_flatten() %>%
  layer_activation_relu() %>%
  layer_dense_1(1)
lines(x,predict(model, x)+0.2, type = "l", col = "blue")

Created on 2023-10-03 with reprex v2.0.2

alucas69 commented 1 year ago

thank you again, t-kalinowski, for taking theme to solve this issue. Thank you also for the good suggestion. Running the same code, I get a different output.

image

My constellation: MacBook Air, M1, MacOS Sonoma 14.0 platform aarch64-apple-darwin20
version.string R version 4.3.0 (2023-04-21) Keras and Tensorflow packages up to date 2023-10-04

Maybe you can try the following adapted version of your code (only change is from RELU and GELU)

library(tensorflow)
library(keras)

x <- as_tensor(-5:5, dtype = tf$float32)

layer_dense_1 <- function(...) layer_dense(..., 
              kernel_initializer = tf$ones,
              bias_initializer = tf$ones)

model <- keras_model_sequential(input_shape = c(1, 1)) %>%
  layer_flatten() %>% layer_dense_1(1, activation = "relu") %>%
  layer_dense_1(1)
plot(x,predict(model, x), type = "l", col = "black")

model <- keras_model_sequential(input_shape = c(1, 1)) %>%
  layer_flatten() %>% layer_dense_1(1, activation = "gelu") %>%
  layer_dense_1(1)
lines(x,predict(model, x), type = "l", col = "red")

image

As you can see, for GELU I have no problems at all.

alucas69 commented 1 year ago

Updated to

version.string R version 4.3.1 (2023-06-16) nickname Beagle Scouts

and updated all packages again (no effect on Keras and TensorFlow), but the issue persists.

alucas69 commented 1 year ago

My colleague on an apple configuration does not get the same error, but gets the correct (your) picture out. Let me first connect to him to ensure that I have the same configuration as him, and then get back to you to avoid wasting your time.

alucas69 commented 1 year ago

Benchmarking my output to some more colleagues, I installed a new python version and implemented in python. The issue was not in the python implementation. Then I reinstalled R and RStudio (several times) and now finally the issue has (mysteriously disappeared. I hope it stays away. Many thanks for your help, Tomasz!

t-kalinowski commented 1 year ago

Thats great!

My best guess is that maybe what you were seeing was due to an older version of tensorflow-metal, which was a specific package build by Apple for M1 macs to enable TensorFlow to use the GPU. The early versions of that package had some bugs. I'm glad that upgrading Python and getting on the latest version of TF/Keras packages fixed it for you.

alucas69 commented 1 year ago

Thank you for the attention devoted to my question and helping out. Very much appreciated! I proceed with fingers crossed not to encounter another deep-down-hidden-mysterious issue.