Attention example is not working

Hello, the following example:

https://keras.rstudio.com/articles/examples/nmt_attention.html

is not working. I am getting the following error:

2020-02-13 16:50:14.747264: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
Error: NotFoundError: Could not find valid device for node.
Node:{{node SparseSoftmaxCrossEntropyWithLogits}}
All kernels registered for op SparseSoftmaxCrossEntropyWithLogits :
  device='CPU'; T in [DT_FLOAT]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_FLOAT]; Tlabels in [DT_INT64]
  device='CPU'; T in [DT_DOUBLE]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_DOUBLE]; Tlabels in [DT_INT64]
  device='CPU'; T in [DT_HALF]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_HALF]; Tlabels in [DT_INT64]
  device='GPU'; T in [DT_FLOAT]; Tlabels in [DT_INT32]
  device='GPU'; T in [DT_FLOAT]; Tlabels in [DT_INT64]
  device='GPU'; T in [DT_HALF]; Tlabels in [DT_INT32]
  device='GPU'; T in [DT_HALF]; Tlabels in [DT_INT64]
 [Op:SparseSoftmaxCrossEntropyWithLogits]

could you please help me? I have also opened the following issue:

https://github.com/rstudio/tensorflow-blog/issues/70

Please let me know. Thanks.

This looks like a TensorFlow installation/ something related to the GPU installation. Does any other example works? Can you paste the results of tensorflow::tf_config()?

I have just runned the example locally and it worked fine.

Hi @dfalbel ,

the following is the output of tensorflow::tf_config() command:

> library(tensorflow)
> tensorflow::tf_config()
2020-02-16 13:40:52.104436: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
TensorFlow v2.0.0 (C:\Users\gazzi\ANACON~1\envs\tf-gpu\lib\site-packages\tensorflow\__init__.p)
Python v3.6 (C:/Users/gazzi/Anaconda3/envs/tf-gpu/python.exe)

It seems to be very short. Is this all what you need? I am also attaching this:

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tensorflow_2.0.0

loaded via a namespace (and not attached):
 [1] compiler_3.6.2       magrittr_1.5         Matrix_1.2-18        tools_3.6.2          whisker_0.4         
 [6] base64enc_0.1-3      rappdirs_0.3.1       Rcpp_1.0.3           reticulate_1.14-9001 grid_3.6.2          
[11] jsonlite_1.6.1       tfruns_1.4           lattice_0.20-38

the following example works fine:

library(keras)
library(stringi)

# Function Definitions ----------------------------------------------------

# Creates the char table and sorts them.
learn_encoding <- function(chars){
  sort(chars)
}

# Encode from a character sequence to a one hot integer representation.
# > encode("22+22", char_table)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
# 2    0    0    0    0    1    0    0    0    0     0     0     0
# 2    0    0    0    0    1    0    0    0    0     0     0     0
# +    0    1    0    0    0    0    0    0    0     0     0     0
# 2    0    0    0    0    1    0    0    0    0     0     0     0
# 2    0    0    0    0    1    0    0    0    0     0     0     0
encode <- function(char, char_table){
  strsplit(char, "") %>%
    unlist() %>%
    sapply(function(x){
      as.numeric(x == char_table)
    }) %>% 
    t()
}

# Decode the one hot representation/probabilities representation
# to their character output.
decode <- function(x, char_table){
  apply(x,1, function(y){
    char_table[which.max(y)]
  }) %>% paste0(collapse = "")
}

# Returns a list of questions and expected answers.
generate_data <- function(size, digits, invert = TRUE){

  max_num <- as.integer(paste0(rep(9, digits), collapse = ""))

  # generate integers for both sides of question
  x <- sample(1:max_num, size = size, replace = TRUE)
  y <- sample(1:max_num, size = size, replace = TRUE)

  # make left side always smaller than right side
  left_side <- ifelse(x <= y, x, y)
  right_side <- ifelse(x >= y, x, y)

  results <- left_side + right_side

  # pad with spaces on the right
  questions <- paste0(left_side, "+", right_side)
  questions <- stri_pad(questions, width = 2*digits+1, 
                        side = "right", pad = " ")
  if(invert){
    questions <- stri_reverse(questions)
  }
  # pad with spaces on the left
  results <- stri_pad(results, width = digits + 1, 
                      side = "left", pad = " ")

  list(
    questions = questions,
    results = results
  )
}

# Parameters --------------------------------------------------------------

# Parameters for the model and dataset
TRAINING_SIZE <- 50000
DIGITS <- 2

# Maximum length of input is 'int + int' (e.g., '345+678'). Maximum length of
# int is DIGITS
MAXLEN <- DIGITS + 1 + DIGITS

# All the numbers, plus sign and space for padding
charset <- c(0:9, "+", " ")
char_table <- learn_encoding(charset)

# Data Preparation --------------------------------------------------------

# Generate Data
examples <- generate_data(size = TRAINING_SIZE, digits = DIGITS)

# Vectorization
x <- array(0, dim = c(length(examples$questions), MAXLEN, length(char_table)))
y <- array(0, dim = c(length(examples$questions), DIGITS + 1, length(char_table)))

for(i in 1:TRAINING_SIZE){
  x[i,,] <- encode(examples$questions[i], char_table)
  y[i,,] <- encode(examples$results[i], char_table)
}

# Shuffle
indices <- sample(1:TRAINING_SIZE, size = TRAINING_SIZE)
x <- x[indices,,]
y <- y[indices,,]

# Explicitly set apart 10% for validation data that we never train over
split_at <- trunc(TRAINING_SIZE/10)
x_val <- x[1:split_at,,]
y_val <- y[1:split_at,,]
x_train <- x[(split_at + 1):TRAINING_SIZE,,]
y_train <- y[(split_at + 1):TRAINING_SIZE,,]

print('Training Data:')
print(dim(x_train))
print(dim(y_train))

print('Validation Data:')
print(dim(x_val))
print(dim(y_val))

# Training ----------------------------------------------------------------

HIDDEN_SIZE <- 128
BATCH_SIZE <- 128
LAYERS <- 1

# Initialize sequential model
model <- keras_model_sequential() 

model %>%
  # "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE.
  # Note: In a situation where your input sequences have a variable length,
  # use input_shape=(None, num_feature).
  layer_lstm(HIDDEN_SIZE, input_shape=c(MAXLEN, length(char_table))) %>%
  # As the decoder RNN's input, repeatedly provide with the last hidden state of
  # RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
  # length of output, e.g., when DIGITS=3, max output is 999+999=1998.
  layer_repeat_vector(DIGITS + 1)

# The decoder RNN could be multiple layers stacked or a single layer.
# By setting return_sequences to True, return not only the last output but
# all the outputs so far in the form of (num_samples, timesteps,
# output_dim). This is necessary as TimeDistributed in the below expects
# the first dimension to be the timesteps.
for(i in 1:LAYERS)
  model %>% layer_lstm(HIDDEN_SIZE, return_sequences = TRUE)

model %>% 
  # Apply a dense layer to the every temporal slice of an input. For each of step
  # of the output sequence, decide which character should be chosen.
  time_distributed(layer_dense(units = length(char_table))) %>%
  layer_activation("softmax")

# Compiling the model
model %>% compile(
  loss = "categorical_crossentropy", 
  optimizer = "adam", 
  metrics = "accuracy"
)

# Get the model summary
summary(model)

# Fitting loop
model %>% fit( 
  x = x_train, 
  y = y_train, 
  batch_size = BATCH_SIZE, 
  epochs = 70,
  validation_data = list(x_val, y_val)
)

# Predict for a new observation
new_obs <- encode("55+22", char_table) %>%
  array(dim = c(1,5,12))
result <- predict(model, new_obs)
result <- result[1,,]
decode(result, char_table)

> # Initialize sequential model
> model <- keras_model_sequential() 
2020-02-16 13:57:42.003975: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
> 
> model %>%
+   # "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE.
+   # Note: In a situation where your input sequences have a variable length,
+   # use input_shape=(None, num_feature).
+   layer_lstm(HIDDEN_SIZE, input_shape=c(MAXLEN, length(char_table))) %>%
+   # As the decoder RNN's input, repeatedly provide with the last hidden state of
+   # RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
+   # length of output, e.g., when DIGITS=3, max output is 999+999=1998.
+   layer_repeat_vector(DIGITS + 1)
2020-02-16 13:57:49.326293: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-02-16 13:57:49.355333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2070 with Max-Q Design major: 7 minor: 5 memoryClockRate(GHz): 1.185
pciBusID: 0000:01:00.0
2020-02-16 13:57:49.355816: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-02-16 13:57:49.356779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-02-16 13:57:49.357368: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-02-16 13:57:49.360550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2070 with Max-Q Design major: 7 minor: 5 memoryClockRate(GHz): 1.185
pciBusID: 0000:01:00.0
2020-02-16 13:57:49.360919: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-02-16 13:57:49.361579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-02-16 13:57:49.969731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-16 13:57:49.970017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-02-16 13:57:49.970147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-02-16 13:57:49.971031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6306 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 7.5)
> 
> # The decoder RNN could be multiple layers stacked or a single layer.
> # By setting return_sequences to True, return not only the last output but
> # all the outputs so far in the form of (num_samples, timesteps,
> # output_dim). This is necessary as TimeDistributed in the below expects
> # the first dimension to be the timesteps.
> for(i in 1:LAYERS)
+   model %>% layer_lstm(HIDDEN_SIZE, return_sequences = TRUE)
> 
> model %>% 
+   # Apply a dense layer to the every temporal slice of an input. For each of step
+   # of the output sequence, decide which character should be chosen.
+   time_distributed(layer_dense(units = length(char_table))) %>%
+   layer_activation("softmax")
> 
> # Compiling the model
> model %>% compile(
+   loss = "categorical_crossentropy", 
+   optimizer = "adam", 
+   metrics = "accuracy"
+ )
> 
> # Get the model summary
> summary(model)
Model: "sequential"
_______________________________________________________________________________________________________________
Layer (type)                                     Output Shape                                 Param #          
===============================================================================================================
lstm (LSTM)                                      (None, 128)                                  72192            
_______________________________________________________________________________________________________________
repeat_vector (RepeatVector)                     (None, 3, 128)                               0                
_______________________________________________________________________________________________________________
lstm_1 (LSTM)                                    (None, 3, 128)                               131584           
_______________________________________________________________________________________________________________
time_distributed (TimeDistributed)               (None, 3, 12)                                1548             
_______________________________________________________________________________________________________________
activation (Activation)                          (None, 3, 12)                                0                
===============================================================================================================
Total params: 205,324
Trainable params: 205,324
Non-trainable params: 0
_______________________________________________________________________________________________________________
> 
> # Fitting loop
> model %>% fit( 
+   x = x_train, 
+   y = y_train, 
+   batch_size = BATCH_SIZE, 
+   epochs = 1,
+   validation_data = list(x_val, y_val)
+ )
2020-02-16 13:57:52.919155: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
Train on 45000 samples, validate on 5000 samples
45000/45000 [==============================] - 8s 170us/sample - loss: 1.6296 - accuracy: 0.3898 - val_loss: 1.3092 - val_accuracy: 0.5236
> 
> # Predict for a new observation
> new_obs <- encode("55+22", char_table) %>%
+   array(dim = c(1,5,12))
> result <- predict(model, new_obs)
> result <- result[1,,]
> decode(result, char_table)
[1] " 78"

It loads keras and tensorflow and it trains nomally.

Hi @dfalbel , I have tried to install a new conda environment with tensorflow CPU version and I am getting a similar error:

Error: NotFoundError: Could not find valid device for node.
Node:{{node SparseSoftmaxCrossEntropyWithLogits}}
All kernels registered for op SparseSoftmaxCrossEntropyWithLogits :
  device='CPU'; T in [DT_FLOAT]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_FLOAT]; Tlabels in [DT_INT64]
  device='CPU'; T in [DT_DOUBLE]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_DOUBLE]; Tlabels in [DT_INT64]
  device='CPU'; T in [DT_HALF]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_HALF]; Tlabels in [DT_INT64]
 [Op:SparseSoftmaxCrossEntropyWithLogits]

This is very strange, I would suspect a problem with the Windows build of TF, but I don't find related issues in the TF repo... Did you install TensorFlow via conda or via pip or via install_tensorflow()?

You could try the following...

loss_object = tf$keras$losses$SparseCategoricalCrossentropy(
    from_logits=TRUE, reduction='NULL')

loss_function <- function(real, pred) {
  mask = tf$math$logical_not(tf$math$equal(real, 0))
  loss_ = loss_object(real, pred)

  mask = tf$cast(mask, dtype=loss_$dtype)
  loss_ = loss_ *  mask

   tf$reduce_mean(loss_)
}

(((Evidently they have recently updated the tutorial: https://www.tensorflow.org/tutorials/text/nmt_with_attention)))

Independently, in the original version, it would be interesting to see the datatypes of y[, t] and preds directly before this line:

loss <- loss + cx_loss(y[, t], preds)

Could you add something like

print(y[, t]$dtype)
print(preds$dtype)

and tell us what it prints?

Finally, just - even in the CPU case, where this doens't look so likely ... - just to exclude it's a memory problem, could you just reduce the dataset to say 100 items, restart the session, and run the model on that?

Hi @skeydan ,

I installed with install_tensorflow() By the way I am also noticing another error:

> tfe_enable_eager_execution(device_policy = "silent")
Error in py_get_attr_impl(x, name, silent) : 
  AttributeError: module 'tensorflow' has no attribute 'contrib'

But I have seen an issue in github for python version and it should be ok. Are you also getting this? With the print lines I am getting:

<dtype: 'float32'>
<dtype: 'float32'>
Error: NotFoundError: Could not find valid device for node.
Node:{{node SparseSoftmaxCrossEntropyWithLogits}}
All kernels registered for op SparseSoftmaxCrossEntropyWithLogits :
  device='CPU'; T in [DT_FLOAT]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_FLOAT]; Tlabels in [DT_INT64]
  device='CPU'; T in [DT_DOUBLE]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_DOUBLE]; Tlabels in [DT_INT64]
  device='CPU'; T in [DT_HALF]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_HALF]; Tlabels in [DT_INT64]
 [Op:SparseSoftmaxCrossEntropyWithLogits]

The following instruction does not work:

loss_object = tf$keras$losses$SparseCategoricalCrossentropy(from_logits=TRUE, reduction='NULL')

I am getting:

> loss_object = tf$keras$losses$SparseCategoricalCrossentropy(from_logits=TRUE, reduction='NULL')
Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: Invalid Reduction Key NULL.

Detailed traceback: 
  File "C:\Users\gazzi\Anaconda3\envs\tf-cpu\lib\site-packages\tensorflow_core\python\keras\losses.py", line 528, in __init__
    from_logits=from_logits)
  File "C:\Users\gazzi\Anaconda3\envs\tf-cpu\lib\site-packages\tensorflow_core\python\keras\losses.py", line 204, in __init__
    super(LossFunctionWrapper, self).__init__(reduction=reduction, name=name)
  File "C:\Users\gazzi\Anaconda3\envs\tf-cpu\lib\site-packages\tensorflow_core\python\keras\losses.py", line 92, in __init__
    losses_utils.ReductionV2.validate(reduction)
  File "C:\Users\gazzi\Anaconda3\envs\tf-cpu\lib\site-packages\tensorflow_core\python\ops\losses\loss_reduction.py", line 68, in validate
    raise ValueError('Invalid Reduction Key %s.' % key)

I also forgot to mention I need to use:

optimizer <- tf$compat$v1$train$AdamOptimizer()

instead of

optimizer <- tf$train$AdamOptimizer()

That you use in your article.

Last thing, I am getting same issue even with 100 samples. Just tested with CPU version only.

Please let me know what you think about it. Thanks.

Are you using the version on github which was updated for TF?

https://github.com/rstudio/keras/blob/master/vignettes/examples/nmt_attention.R

This does not have tfe_enable_eager_execution and also, has

optimizer <- tf$optimizers$Adam()

Could you first just test what happens when you use the code from there please?

Still, the output from the datatypes makes me wonder... So if it still fails, could you try explicitly casting y[, t] to int32 before calling the (original) loss function?

tmp <- tf$cast(y[, t], tf$int32)

@skeydan if I use that code I am getting:

Error: NotFoundError: Could not find valid device for node.
Node:{{node SparseSoftmaxCrossEntropyWithLogits}}
All kernels registered for op SparseSoftmaxCrossEntropyWithLogits :
  device='CPU'; T in [DT_FLOAT]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_FLOAT]; Tlabels in [DT_INT64]
  device='CPU'; T in [DT_DOUBLE]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_DOUBLE]; Tlabels in [DT_INT64]
  device='CPU'; T in [DT_HALF]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_HALF]; Tlabels in [DT_INT64]
 [Op:SparseSoftmaxCrossEntropyWithLogits]

Detailed traceback: 
  File "C:\Users\gazzi\Anaconda3\envs\tf-cpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 3477, in sparse_softmax_cross_entropy_with_logits_v2
    labels=labels, logits=logits, name=name)
  File "C:\Users\gazzi\Anaconda3\envs\tf-cpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 3397, in sparse_softmax_cross_entropy_with_logits
    precise_logits, labels, name=name)
  File "C:\Users\gazzi\Ana

if I add the printing instruction to this script I am getting:

tf.Tensor([7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7], shape=(32,), dtype=int32)
Error: NotFoundError: Could not find valid device for node.
Node:{{node SparseSoftmaxCrossEntropyWithLogits}}
All kernels registered for op SparseSoftmaxCrossEntropyWithLogits :
  device='CPU'; T in [DT_FLOAT]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_FLOAT]; Tlabels in [DT_INT64]
  device='CPU'; T in [DT_DOUBLE]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_DOUBLE]; Tlabels in [DT_INT64]
  device='CPU'; T in [DT_HALF]; Tlabels in [DT_INT32]
  device='CPU'; T in [DT_HALF]; Tlabels in [DT_INT64]
 [Op:SparseSoftmaxCrossEntropyWithLogits]

Detailed traceback: 
  File "C:\Users\gazzi\Anaconda3\envs\tf-cpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 3477, in sparse_softmax_cross_entropy_with_logits_v2
    labels=labels, logits=logits, name=name)
  File "C:\Users\gazzi\Anaconda3\envs\tf-cpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 3397, in sparse_softmax_cross_entropy_with_logits
    precise_logits, labels, name=name)
  File "C:\Users\gazzi\Ana

rstudio / keras3

Attention example is not working #983