Word Embedding Example - shape error

GarryGelade commented 5 years ago

I tried to reproduce the Keras word embedding example here https://blogs.rstudio.com/tensorflow/posts/2017-12-22-word-embeddings-with-keras/ along with the updated skip_grams_generator function #740

Here is the relevant code:

tokenizer %>% fit_text_tokenizer(reviews) 

library(reticulate)
skipgrams_generator <- function(text, tokenizer, window_size, negative_samples) {
  gen <- texts_to_sequences_generator(tokenizer, sample(text))
  function() {

    while(TRUE) {
      nxt <- generator_next(gen)
      if (length(nxt) > 1)
        break
    }

    skip <- nxt %>%
      skipgrams(
        vocabulary_size = tokenizer$num_words, 
        window_size = window_size, 
        negative_samples = 1
      )
    x <- transpose(skip$couples) %>% map(. %>% unlist %>% as.matrix(ncol = 1))
    y <- skip$labels %>% as.matrix(ncol = 1)

    list(x, y)
  }
}

embedding_size <- 128 
skip_window <- 5  
num_sampled <- 1  

input_target <- layer_input(shape = 1)
input_context <- layer_input(shape = 1)

# define embedding matrix
embedding <- layer_embedding(
  input_dim = tokenizer$num_words + 1, 
  output_dim = embedding_size, 
  input_length = 1, 
  name = "embedding"
)

target_vector <- input_target %>% 
  embedding() %>% 
  layer_flatten()

context_vector <- input_context %>%
  embedding() %>%
  layer_flatten()

dot_product <- layer_dot(list(target_vector, context_vector), axes = 1)
output <- layer_dense(dot_product, units = 1, activation = "sigmoid")

model <- keras_model(list(input_target, input_context), output)
model %>% compile(loss = "binary_crossentropy", optimizer = "adam")
summary(model)

model %>%
  fit_generator(
    skipgrams_generator(reviews, tokenizer, skip_window, negative_samples), 
    steps_per_epoch = 100000, epochs = 2
  )

I get this error

Error in py_call_impl(callable, dots$args, dots$keywords) : AttributeError: 'list' object has no attribute 'shape'

Detailed traceback: File "C:\Users\garry\ANACON~1\envs\R-TENS~1\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1779, in fit_generator initial_epoch=initial_epoch) File "C:\Users\garry\ANACON~1\envs\R-TENS~1\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 194, in fit_generator batch_size = x[0].shape[0]

I am using Keras 2.2.4.1, tensorflow 1.14.0, reticulate 1.13 and R 3.5.0 on Windows 10 The model looks to have been built OK

Layer (type) Output Shape Param Connected to

input_1 (InputLayer) (None, 1) 0

input_2 (InputLayer) (None, 1) 0

embedding (Embedding) (None, 1, 128) 2560128 input_1[0][0]
input_2[0][0]

flatten (Flatten) (None, 128) 0 embedding[0][0]

flatten_1 (Flatten) (None, 128) 0 embedding[1][0]

dot (Dot) (None, 1) 0 flatten[0][0]
flatten_1[0][0]

dense (Dense) (None, 1) 2 dot[0][0]

Total params: 2,560,130 Trainable params: 2,560,130 Non-trainable params: 0

Any help appreciated. Thanks.

GarryGelade commented 5 years ago

UPDATE:

I updated my version of tensorFlow to v1.14.0,

devtools::install_github("rstudio/tensorflow")
library(tensorflow)
install_tensorflow(){ }

Now I don't get the shape error. Looks like I am now getting the same sort of error reported in #740 by Sophia0616.

Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays: [array([[[ 767], [ 5], [ 9], [ 104], [ 450], [ 42], [ 767], [ 6539], [ 1968], [ 899], [ 899], [ ...

Detailed traceback: File "C:\Users\garry\ANACON~1\envs\R-RETI~1\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1433, in fit_generator steps_name='steps_per_epoch') File "C:\Users\garry\ANACON~1\envs\R-RETI~1\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 264, in model_iteration batch_outs = batch_function(*batch_data) File "C:\Users\garry\ANACON~1\envs\R-RETI~1\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1153, in train_on_batch extract_tensors_from_dataset=True) File

GarryGelade commented 5 years ago

UPDATE 2:

I save initial values of x and y to global variables by executing

    x <<- transpose(skip$couples) %>% map(. %>% unlist %>% as.matrix(ncol = 1))
    y <<- skip$labels %>% as.matrix(ncol = 1)

within the skipgram function. I then run the model without a fit generator or the skipgram function,

model %>% fit(x, y, epochs=1, batch_size = 32, validation_split = 0.2)

This runs fine, which suggests the output of skipgram is OK, but maybe fitgenerator is screwing up the output of skipgram?

rstudio / keras3

Word Embedding Example - shape error #867