rstudio / tfdatasets

R interface to TensorFlow Datasets API
https://tensorflow.rstudio.com/tools/tfdatasets/
34 stars 12 forks source link

Keras Error in py_call_impl(callable, dots$args, dots$keywords) : #73

Open ghost opened 4 years ago

ghost commented 4 years ago

I am doing deep learning using Keras in Rstudio.I copy and paste this link https://tensorflow.rstudio.com/tutorials/beginners/basic-ml/tutorial_basic_regression/

boston_housing <- dataset_boston_housing()

c(train_data, train_labels) %<-% boston_housing$train
c(test_data, test_labels) %<-% boston_housing$test

paste0("Training entries: ", length(train_data), ", labels: ", length(train_labels))

train_data[1, ] # Display sample features, notice the different scales

library(dplyr)

column_names <- c('CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 
                  'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT')

train_df <- train_data %>% 
  as_tibble(.name_repair = "minimal") %>% 
  setNames(column_names) %>% 
  mutate(label = train_labels)

test_df <- test_data %>% 
  as_tibble(.name_repair = "minimal") %>% 
  setNames(column_names) %>% 
  mutate(label = test_labels)

train_labels[1:10] # Display first 10 entries

spec <- feature_spec(train_df, label ~ . ) %>% 
  step_numeric_column(all_numeric(), normalizer_fn = scaler_standard()) 

spec <- fit(spec)

layer <- layer_dense_features(
  feature_columns = dense_features(spec), 
  dtype = tf$float32
)

layer(train_df)

layer(train_df)

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: ('We expected a dictionary here. Instead we got: ',          CRIM    ZN  INDUS  CHAS     NOX  ...    TAX  PTRATIO       B  LSTAT  label
0     1.23247   0.0   8.14   0.0  0.5380  ...  307.0     21.0  396.90  18.72   15.2
1     0.02177  82.5   2.03   0.0  0.4150  ...  348.0     14.7  395.38   3.11   42.3
**sessionInfo()**
R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale: [1] LC_COLLATE=Spanish_Chile.1252 LC_CTYPE=Spanish_Chile.1252 LC_MONETARY=Spanish_Chile.1252 [4] LC_NUMERIC=C LC_TIME=Spanish_Chile.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] dplyr_0.8.5 tfdatasets_2.0.0 keras_2.2.5.0 tensorflow_2.0.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.3 pillar_1.4.3 compiler_3.6.3 prettyunits_1.1.1 base64enc_0.1-3 tools_3.6.3
[7] progress_1.2.2 zeallot_0.1.0 digest_0.6.25 packrat_0.5.0 jsonlite_1.6.1 evaluate_0.14
[13] tibble_2.1.3 pkgconfig_2.0.3 rlang_0.4.5 cli_2.0.2 rstudioapi_0.11 yaml_2.2.1
[19] xfun_0.12 knitr_1.28 generics_0.0.2 vctrs_0.2.4 rappdirs_0.3.1 hms_0.5.3
[25] tidyselect_1.0.0 reticulate_1.14 glue_1.3.2 forge_0.2.0 R6_2.4.1 fansi_0.4.1
[31] rmarkdown_2.1 purrr_0.3.3 magrittr_1.5 whisker_0.4 tfestimators_1.9.1 tfruns_1.4
[37] htmltools_0.4.0 assertthat_0.2.1 crayon_1.3.4
wbstrlncln commented 3 years ago

I've had similar issues. I guess the culprit is "normalizer_fn = scaler_standard()"

it returns NaN on the trianing set after running: train_dataset %>% reticulate::as_iterator() %>% reticulate::iter_next() %>% layer()

nurb-ude commented 2 years ago

Even when removing "normalizer_fn = scaler_standard()" I am still getting the error. However, when running the network

input <- layer_input_from_dataset(train_df %>% select(-label))

output <- input %>% 
  layer_dense_features(dense_features(spec)) %>% 
  layer_dense(units = 64, activation = "relu") %>%
  layer_dense(units = 64, activation = "relu") %>%
  layer_dense(units = 1) 

model <- keras_model(input, output)

  model %>% 
    compile(
      loss = "mse",
      optimizer = optimizer_rmsprop(),
      metrics = list("mean_absolute_error")
    )

# Display training progress by printing a single dot for each completed epoch.
print_dot_callback <- callback_lambda(
  on_epoch_end = function(epoch, logs) {
    if (epoch %% 80 == 0) cat("\n")
    cat(".")
  }
)    

history <- model %>% fit(
  x = train_df %>% select(-label),
  y = train_df$label,
  epochs = 500,
  validation_split = 0.2,
  verbose = 0,
  view_metrics = TRUE,
  callbacks = list(print_dot_callback, callback_early_stopping(monitor = "val_loss", patience = 5, restore_best_weights = TRUE))
)

everything works fine and I think even the scaling works, because if I delete the scaler from spec the NN has a way harder time converging.

So I think, the exception is just caused by calling layer(train_df) and nothing else is impacted by the error above.

However, it has been more than 2 years since this problem popped up, is there a known solution as to why an error is thrown when calling layer(train_df) while everything else seems to work just fine?

t-kalinowski commented 2 years ago

Feature columns have been deprecated upstream, so this is unlikely to be fixed in the R interface (though I would merge a simple PR).

Doing any kind of stateful feature preprocessing is best done with Keras preprocessing layers these days: https://keras.rstudio.com/articles/new-guides/preprocessing_layers.html