Open dfalbel opened 6 years ago
@dfalbel Have you tried pattern matching for feature columns?
yes, but i didn't figure out how to use it with categorical variables since we need to specify a different vocabulary for each variable.
Em qui, 11 de jan de 2018 17:01, Yuan (Terry) Tang notifications@github.com escreveu:
@dfalbel https://github.com/dfalbel Have you tried pattern matching for feature columns https://tensorflow.rstudio.com/tfestimators/articles/feature_columns.html#pattern-matching ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rstudio/tfestimators/issues/136#issuecomment-357027558, or mute the thread https://github.com/notifications/unsubscribe-auth/AEfSBnEh4u3f6lh9izZhLsuLJk8112a5ks5tJlqbgaJpZM4RbLfn .
@dfalbel I am trying to understand what you cannot achieve using pattern matching. Could you elaborate a bit? As for your other questions, there might be something wrong in the Python API. Could you try running their official wide and deep example in Python to see if the results are similarly problematic?
@terrytangyuan suppose I have this df:
library(tfestimators)
df <- data_frame(
x1_cat = sample(letters, 100, replace = TRUE),
x2_cat = sample(LETTERS, 100, replace = TRUE),
x3_cat = sample(c(letters, LETTERS), 100, replace = TRUE),
x4_num = runif(100),
x5_num = rnorm(100)
)
I can use pattern matching to create feature columns or numeric vars, eg:
cols <- with_columns(df, {
feature_columns(
column_numeric(ends_with("num"))
)
})
But, for categorical columns I can't because I need to pass a different vocabulary for each column. For example:
cols <- with_columns(df, {
feature_columns(
column_numeric(ends_with("num")),
column_categorical_with_vocabulary_list(ends_with("cat"))
)
})
Error in py_resolve_dots(list(...)) :
argument "vocabulary_list" is missing, with no default
I'll run the wide & deep example from python as soon as possible too.
Oh I see. There isn’t existing good practice for that yet so feel free to submit PR to show an example, e.g. using map2() and unique().
I am going to submit a CRAN update to tfestimators soon (next few days) and just wanted to check in here to make sure there aren't any changes/fixes needed as a result of this thread before I do that.
Did any updated examples get put into a vignette somewhere? This is very similar to a problem I have where I'm trying to convert a factor variable to one hot encoded values but the dnn_classifier takes neither a straight indicator or categorical_with_vocabulary_list.
I'm working on an end-to-end example for tfestimators but I'm still having trouble to understand best practices.
Consider the Wide & Deep Example available here
We start by defining all categorical values and their possible values. Eg.
This is ok for small datasets, but for datasets with more variables this would be a long work. Of course we can create some code to automate this, for example:
I think this will be very common, so we could add a function to do exactly this. We can do similar for other variables:
Now, suppose I wan't to train a logistic regression model. I would run:
This will return:
[/] Training -- loss: 24.30, step: 2544
Here, I don't understand the loss - is this the final loss or the last processed batch loss.
Then I use the
evaluate()
function to get model results on the test data.1) I didn't find why tensor flow outputs those warnings... 2) Printed loss doesn't seems to be the final loss since running evaluate again returns different results 3) The results table below. It seems that tensorflow is misunderstanding the labels since accuracy_baseline is 1 and label/mean is 0. I tried passing the labels as booleans, but got the same result.
That's it! Let me know if I'm doing something wrong!!