rstudio / tfestimators

R interface to TensorFlow Estimators
https://tensorflow.rstudio.com/tfestimators
57 stars 21 forks source link

Rewrite examples in R #46

Open terrytangyuan opened 6 years ago

terrytangyuan commented 6 years ago

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/learn

terrytangyuan commented 6 years ago

@kevinushey @dfalbel @jjallaire Help on this is highly appreciated but make sure you comment here if you are working on certain examples to avoid duplicate efforts!

kevinushey commented 6 years ago

What I would really love to have is, rather than just a direct port of TensorFlow examples to tfestimators, is side-by-side comparisons of:

  1. The R way of doing something, versus
  2. The tfestimators way of doing something.

This should be possible for at least some of the canned estimators, e.g. linear regression, random forest models, and so on.

R also provides the nnet package by default, so it would be interesting to see how a model fit with nnet() is translated to tfestimators (if possible). Example copied from the ?nnet:

# Fit single-hidden-layer neural network, possibly with skip-layer connections.
library(nnet)

# use half the iris data
ir <- rbind(iris3[, , 1], iris3[, , 2], iris3[, , 3])
targets <- class.ind(c(rep("s", 50), rep("c", 50), rep("v", 50)))
samp <- c(sample(1:50, 25), sample(51:100, 25), sample(101:150, 25))

model <- nnet(ir[samp, ],
              targets[samp, ],
              size = 2,
              rang = 0.1,
              decay = 5e-4,
              maxit = 200)

Note that I'm not advocating for us following the nnet() interface or anything here; I just think we should try to build bridges between the R way of expressing a model and the tfestimators way of expressing a model.

jjallaire commented 6 years ago

I think it could be interesting to show comparisons between R and estimators, but the nnet one maybe not because it was developed so long ago and reflects none of the recent advances in the field (i.e. it's less than a toy example compared to estimators neural nets, whereas the other modeling functions are not).

On Tue, Jul 11, 2017 at 6:34 PM, Kevin Ushey notifications@github.com wrote:

What I would really love to have is, rather than just a direct port of TensorFlow examples to tfestimators, is side-by-side comparisons of:

  1. The R way of doing something, versus
  2. The tfestimators way of doing something.

This should be possible for at least some of the canned estimators, e.g. linear regression, random forest models, and so on.

R also provides the nnet package by default, so it would be interesting to see how a model fit with nnet() is translated to tfestimators (if possible). Example copied from the ?nnet:

Fit single-hidden-layer neural network, possibly with skip-layer connections.

library(nnet)

use half the iris datair <- rbind(iris3[, , 1], iris3[, , 2], iris3[, , 3])targets <- class.ind(c(rep("s", 50), rep("c", 50), rep("v", 50)))samp <- c(sample(1:50, 25), sample(51:100, 25), sample(101:150, 25))

model <- nnet(ir[samp, ], targets[samp, ], size = 2, rang = 0.1, decay = 5e-4, maxit = 200)

Note that I'm not advocating for us following the nnet() interface or anything here; I just think we should try to build bridges between the R way of expressing a model and the tfestimators way of expressing a model.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rstudio/tfestimators/issues/46#issuecomment-314591412, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGXxymoq00lcvjW_DrDmgK-KfMI9LdYks5sM_iRgaJpZM4OU2rH .

terrytangyuan commented 6 years ago

I personally feel like there's not too much value to compare with traditional approach. Traditional approaches are very limited. It's basically initializing and training the model in the same line, and then prediction in another line. However, you can probably show gains/benefits from using feature columns or constructing the features yourself in advance to model feeding. Though the benefits in terms of how the API reduces the amount of code needed might be too obvious.

kevinushey commented 6 years ago

My point is that this gives R users a 'jumping off' point, where they can see how models they might already know how to express in R become expressed in the TensorFlow ecosystem. That is, such examples will be helpful in educating users as to how the stuff they already know fits into the TensorFlow mold.


I think it's important to keep in mind that, broadly speaking, there are two camps of potential users of the tfestimators package:

  1. Data scientists / analysts from the machine learning world, who are more likely familiar with the tools already available in Python (scikit-learn, torch, and so on). These users have more experience with building models that receive lots of data (e.g. streaming data, and so on). The emphasis is on predictive power, not necessarily developing a deep understanding the dataset(s) used to fit the model (or even the model itself -- ie in the machine learning world it's okay if the model is a 'black box' as long as it does the right thing).

  2. R users / analysts from the statistics world, who are already familiar with the modeling approaches provided by e.g. the base R stats package, and other contributed CRAN packages (which largely follow the 'formula interface' modeling approach). Emphasis here is usually on understanding a small number of data sets very well, developing a model that correctly models the process from which that data was generated, and using that to further understand the data + make inferences about the relationships between variables in that data set.

I think it's relatively unlikely that we'll be able to convince users in group 1) to try out this package, unless we can present a great value add in terms of how one can use the powers of the R ecosystem alongside TensorFlow. (Think e.g. visualization with ggplot2, interactive web applications with shiny, and so on)

For group 2), we need to convince these users that TensorFlow can actually solve problems they care about. This is harder than it looks because the ultimate goal in machine learning (highly predictive models) is not the same as that of the traditional R user / statistician (developing a deep understanding of data + the relationship between variables in a dataset). Because of this, I think traditional R users are going to approach TensorFlow with a lot of skepticism and we need to be cognizant of that.

jjallaire commented 6 years ago

As I argued elsewhere, some of this skepticism I think we need to just accept. This is especially the case when the R user has no intrinsic motivation or desire to use TensorFlow in the first place (that might be an un-winnable battle). I think R users will be motivated to use TensorFlow for a few possible reasons:

1) They want to use state of the art neural networks to improve the effectiveness of their models.

2) They want to use a framework that provides a uniform interface to a wide variety of model types (note though that they could also just use caret for this today).

3) They want to use a framework that will scale well to extremely large datasets.

4) They want to build models that can be deployed anywhere (cloud, server, mobile, etc.) with no R runtime.

One or more of those factors will likely be in play, which will provide some momentum in the learning process.

I think for starters we can just make tfestimators as easy as possible to work with with from R and just present it as it is. That's for starters to just get us out of the gate.

I agree though that we will also want tutorials along the lines of what Kevin is suggesting ("TensorFlow for R Users") where we compare and contrast traditional R methods with TF methods, actually explaining the differences in approach (and their costs/benefits) as we go.

kevinushey commented 6 years ago

Are we planning to include these all as part of the vignettes/examples folder? (I'm guessing yes just because that will play nicely with e.g. pkgdown?)

jjallaire commented 6 years ago

Yes, all the examples should be in vignettes/examples

On Tue, Jul 25, 2017 at 1:36 PM, Kevin Ushey notifications@github.com wrote:

Are we planning to include these all as part of the vignettes/examples folder? (I'm guessing yes just because that will play nicely with e.g. pkgdown?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rstudio/tfestimators/issues/46#issuecomment-317811965, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGXx41RfOdHxjeK7bUuO6hbcGTDT9ybks5sRieWgaJpZM4OU2rH .

terrytangyuan commented 6 years ago

Check out this folder for officially supported/maintained/tested examples using estimators: https://github.com/tensorflow/models/tree/master/official.

eddelbuettel commented 6 years ago

Any progress? I started noodling with a first example from .py to .R but didn't get too far yet.

terrytangyuan commented 6 years ago

@eddelbuettel No progress on this at least from my end. It would be very nice to rewrite those TF Estimators official tutorials in R.