rstudio / cloudml

R interface to Google Cloud Machine Learning Engine
https://tensorflow.rstudio.com/tools/cloudml/
65 stars 24 forks source link

consider setting repo to use linux binaries from PPM #217

Open slopp opened 3 years ago

slopp commented 3 years ago

Note that the very first time you submit a job to CloudML the various packages required to run your script will be compiled from source. This will make the execution time of the job considerably longer that you might expect. It’s only the first job that incurs this overhead though (since the package installations are cached), and subsequent jobs will run more quickly.

We could significantly reduce the first job time and compilation errors by using the public package manager to provide binary packages, potentially as an opt-out option

javierluraschi commented 3 years ago

This sounds pretty great, honestly! Our hesitation here is that we need to reconsider how one trains torch jobs in the cloud, if the answer is cloudml, which I think might be, then we should totally do this work.

javierluraschi commented 3 years ago

I'd add that currently, cloudml does not have a dependency to Python/reticulate, so this could be a great way to train models. However, is also worth considering if we could come up with a multi-cloud approach that supports more than just Google Cloud, maybe even RStudio Connect or the Job Launcher?