Closed atroiano closed 5 years ago
looks like I can do something like this
scoring<-tensors_dataset(data_test)
scoring<-dataset_use_spec(scoring,ft_spec_1)
iter <- make_iterator_one_shot(scoring)
score_this<- iterator_get_next(iter)
scored_test_data <- model(score_this[[1]])
but the preprocessing is not applied to the dataset.
Unrelated to my example above
library(tfdatasets)
data(hearts)
file <- tempfile()
writeLines(unique(hearts$thal), file)
hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32)
# use the formula interface
spec <- feature_spec(hearts, target ~ thal) %>%
step_categorical_column_with_vocabulary_list(thal) %>%
step_embedding_column(thal, dimension = 3)
spec_fit <- fit(spec)
final_dataset <- hearts %>% dataset_use_spec(spec_fit)
iter <- make_iterator_one_shot(final_dataset)
score_this<- iterator_get_next(iter)
score_this
This is another example where dataset_use_spec does not appear to be applying any transformation to the data.
Can you try changing your last line to:
score_test_data <- model(as.list(data_test %>% select(a)))
says it can't have rank 0
score_test_data <- model(as.list(data_test %>% select(a)))
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Feature (key: a) cannot have rank 0. Give: 36
@atroiano As a workaround you can do this: model(keras_array(data_test))
. I'll figure out how make the keras_array
casll default into Keras.
@dfalbel That works well for getting data scored without scaling it (I don't see where the scaling would happen in this instance).
It appears the following code will work for scoring as well.
scoring<-tensors_dataset(data_test)
scoring<-dataset_use_spec(scoring,ft_spec_1)
iter <- make_iterator_one_shot(scoring)
score_this<- iterator_get_next(iter)
scored_test_data <- model(score_this[[1]])
When this runs, I am assuming the dataset_use_spec will apply the feature_spec transformation to my dataset based on the dataset it was fit on but it does not appear to happen
step_numeric_column(one_of(c('a')),normalizer_fn = scaler_standard())
score_this[[1]] is the right object to pass but the data in there is not scaled, like I outlined in the step above.
I have a more complex example that is using embedding layers and I am running into the same issue only the columns are not being converted based on the vocab.
This should actually scale the inputs since layer_dense_features
adds the transformations to the graph. Eg.:
library(tfdatasets)
library(keras)
df <- data.frame(
x = 1:10,
y = 1:10
)
spec <- feature_spec(df, y ~ x) %>%
step_numeric_column(x, normalizer_fn = scaler_standard())
spec <- fit(spec)
inputs <- layer_input_from_dataset(df)
output <- layer_dense_features(inputs, feature_columns = spec$dense_features())
model <- keras_model(inputs, output)
model(keras_array(df))
How do you get this example to work with embedding columns?
library(tfdatasets)
library(keras)
df <- data.frame(
x = 1:10,
z = c(rep('b',5),rep('a',5)),
y = 1:10
)
k_clear_session()
spec <- feature_spec(df, y ~ x+z) %>%
step_numeric_column(x, normalizer_fn = scaler_standard()) %>%
step_categorical_column_with_vocabulary_list(z) %>%
step_embedding_column(z) %>% fit()
inputs <- layer_input_from_dataset(df)
outputs <-
inputs %>%
layer_dense_features(spec$dense_features()) %>%
layer_dense(units=1)
model <- keras_model(inputs, outputs)
model(keras_array(df))
does not work, says
error in py_get_attr_impl(x, name, silent) :
AttributeError: 'list' object has no attribute 'dtype'
I think the problem here is just related to how we deal with factors. Works as expected if you set:
df <- data.frame(
x = 1:10,
z = c(rep('b',5),rep('a',5)),
y = 1:10,
stringsAsFactors = FALSE
)
I get the error: Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: Column dtype and SparseTensors dtype must be compatible. key: z, column dtype: <dtype: 'string'>, tensor dtype: <dtype: 'int32'>
library(tfdatasets)
library(keras)
df <- data.frame(
x = 1:10,
z = c(rep('b',5),rep('a',5)),
y = 1:10,
stringsAsFactors = FALSE
)
k_clear_session()
spec <- feature_spec(df, y ~ x+z) %>%
step_numeric_column(x, normalizer_fn = scaler_standard()) %>%
step_categorical_column_with_vocabulary_list(z) %>%
step_embedding_column(z) %>% fit()
inputs <- layer_input_from_dataset(df)
outputs <-
inputs %>%
layer_dense_features(spec$dense_features()) %>%
layer_dense(units=1)
model <- keras_model(inputs, outputs)
model(keras_array(df))
The above code just works for me. What's you TF version?
2.0.0-beta1
Ok, this seems like a bug! A workaround is to call:
inputs <- reticulate::dict(layer_input_from_dataset(df))
Will push a fix to master ASAP
That workaround outlined above fixed the example, as well as, a more complicated model I have locally.
I really appreciate your help!
Should be fixed in master
Cheers. Thanks again for the quick support!
I am creating a model that is outputting a probability distribution based on the sample in this article https://blogs.rstudio.com/tensorflow/posts/2019-06-05-uncertainty-estimates-tfprobability/
I set up a basic test using tfdatasets and I can get the model to train but I don't know how to get it to take the feature_spec and get the right input to score.
I can manually define the shape of the tensor I need to score like below and it outputs the prediction I need. Though, this defeats one of the purposes of tfdatasets because I would need to get the mean and sd of the dataset and manually apply the scaling. This is compounded when I have embedding layers.
Is there a way to take a dataset and pass it to model() while applying the known feature_spec?