rstudio / cloudml

R interface to Google Cloud Machine Learning Engine
https://tensorflow.rstudio.com/tools/cloudml/
65 stars 24 forks source link

where should auxiliary configuration be specified? #17

Closed kevinushey closed 6 years ago

kevinushey commented 7 years ago

There are a few configuration items that need a home:

I'm sure other things will pop up as we consider new deployment scenarios.

@jjallaire, any thoughts on where these should live?

terrytangyuan commented 7 years ago

Maybe also expose the paths to those config files so users can specify those themselves for situations where these yaml files are generated dynamically?

jjallaire commented 7 years ago

I wonder if we shouldn't move to having a cloudml.yml file that includes:

It would of course be awesome if we could get the flags in there as well. Maybe something like this:

gcloud:
   account: foo
   project: bar
include: *.R
exclude: *.mp3
packrat: true
flags:
    learning_rate: 0.001
    train_data: gs://example.com/data.csv

I think that's the most simple and coherent expression for the user. One question is how we marry this with the notion of the user needing to only do this in their R script:

FLAGS <- flags(
    flag_numeric("learning_rate", 0.005),
    flag_string("train_data", "data.csv")
)

Perhaps we could enhance the flags() function to not only look for this:

flags.yml
   <config-name>:
         flag: value

But also this:

<config-name>.yml
   flags:
        flag: value

That way it would work fine for local oriented configurations which would just use flags.yml but could also adapt to the end-to-end cloudml scenario without proliferation of config files.

I could be oversimplifying, but I'm definitely trying to go for as simple as possible an expression for the user.

In any case I think we should discuss this realtime before making any decisions.

kevinushey commented 7 years ago

Just looping back on this, I think we've decided on the following structure:

File Purpose
flags.yml Application-specific configuration flags.
cloudml.yml Configuration for e.g. gcloud [project, account] and cloudml-specific deployment settings [packrat]

This implies that flags.yml is agnostic to the deployment mechanism used; a separate configuration file is used to manage deployment-related configuration.

jjallaire commented 7 years ago

I think this is a good place to have landed.

On Tue, Oct 3, 2017 at 3:08 PM, Kevin Ushey notifications@github.com wrote:

Just looping back on this, I think we've decided on the following structure: File Purpose flags.yml Application-specific configuration flags. cloudml.yml Configuration for e.g. gcloud [project, account] and cloudml-specific deployment settings [packrat]

This implies that flags.yml is agnostic to the deployment mechanism used; a separate configuration file is used to manage deployment-related configuration.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rstudio/cloudml/issues/17#issuecomment-333946802, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGXx4gjBy-xo999rjKN8frxtLkEKG5Nks5sooYegaJpZM4Pn4AI .

terrytangyuan commented 7 years ago

I am trying to understand better and put pieces together. Is there any good reference for me catch up with CloudML and our approach?

kevinushey commented 7 years ago

This is probably the best reference for Cloud ML itself:

https://cloud.google.com/ml-engine/docs/concepts/technical-overview

The short version of what we're trying to accomplish is batch-wise training + prediction of TensorFlow applications using Google's ML Engine. We're trying to accomplish this in the following way:

  1. Use the gcloud command line tool for the submission of training + prediction 'jobs',
  2. Defining some 'best practices' and documenting how an application can be configured for deployment,
  3. Use tfruns::training_run() on the Google Cloud side to handle runs, and synchronize run outputs for easy viewing and introspection using other tooling in the tfruns package.

In the end, we're hoping that submitting a job to ML Engine is as simple as something like:

job <- cloudml_train(...)      # submits job to ML Engine
collected <- job_collect(job)  # waits for job to complete

And hopefully, once we've figured out the right workflow, we'll augment it with existing RStudio tools (e.g. run a job in a Terminal pane so you can see streamed log output as it comes in).

Because ML Engine expects a Python package as a trainable artefact, we're basically hooking into that framework by disguising an R application as a Python package, uploading that Python package, and then choosing an R endpoint as the mechanism through which the model is trained / predicted on.

terrytangyuan commented 7 years ago

Thanks Kevin! I'll read through it soon.

On Tue, Oct 3, 2017 at 6:21 PM Kevin Ushey notifications@github.com wrote:

This is probably the best reference for Cloud ML itself:

https://cloud.google.com/ml-engine/docs/concepts/technical-overview

The short version of what we're trying to accomplish is batch-wise training + prediction of TensorFlow applications using Google's ML Engine. We're trying to accomplish this in the following way:

  1. Use the gcloud command line tool for the submission of training + prediction 'jobs',
  2. Defining some 'best practices' and documenting how an application can be configured for deployment,
  3. Use tfruns::training_run() on the Google Cloud side to handle runs, and synchronize run outputs for easy viewing and introspection using other tooling in the tfruns package.

In the end, we're hoping that submitting a job to ML Engine is as simple as something like:

job <- cloudml_train(...) # submits job to ML Engine collected <- job_collect(job) # waits for job to complete

And hopefully, once we've figured out the right workflow, we'll augment it with existing RStudio tools (e.g. run a job in a Terminal pane so you can see streamed log output as it comes in).

Because ML Engine expects a Python package as a trainable artefact, we're basically hooking into that framework by disguising an R application as a Python package, uploading that Python package, and then choosing an R endpoint as the mechanism through which the model is trained / predicted on.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rstudio/cloudml/issues/17#issuecomment-333996165, or mute the thread https://github.com/notifications/unsubscribe-auth/AEEnSsGPIJmtJxrjh9m-_q-2RFqxk4uFks5sorNMgaJpZM4Pn4AI .

jjallaire commented 6 years ago

Duplicate of #56