rstudio / cloudml

R interface to Google Cloud Machine Learning Engine
https://tensorflow.rstudio.com/tools/cloudml/
65 stars 24 forks source link

Using private libraries in cloudml_train #171

Open zamorarr opened 6 years ago

zamorarr commented 6 years ago

Hi - thanks for this great package! Is there any way to use a private libraries in a training script sent to cloudml_train? For example in my train.R file

library(keras)
library(myownlib)

model <- keras_model(....

I'm hoping I would be able to point the training function to an archive stored on googleCloudStorage. For example:

cloudml_train("train-batters.r", mylibs = "gs://mybucket/rpkgs"))

With the idea that packrat would know to look there for libraries it couldn't find on CRAN.

Is this feasible? Happy to help out if I can.

javierluraschi commented 6 years ago

@kevinushey might have some thoughts, in the meantime, since cloudml copies all the contents of your local directory, you could build your private library and save the .tar.gz file in the same folder where your train.R file is located. cloudml will upload this package for you, then you could try installing the package from source by adding to the header of train.R something like:

install.packages("myownlib.tar.gz", repos = NULL, type="source")

# .... your existing code ....
zamorarr commented 6 years ago

Oh, thats a good idea. It means the library is not installed and cached beforehand, but at long as I'm only using a few lightweight libraries it shouldn't affect runtime too much. I'll give this a try. Thanks!

On Fri, Jun 15, 2018, 12:05 PM Javier Luraschi notifications@github.com wrote:

@kevinushey https://github.com/kevinushey might have some thoughts, in the meantime, since cloudml copies all the contents of your local directory, you could build your private library and save the .tar.gz file in the same folder where your train.R file is located. cloudml will upload this package for you, then you could try installing the package from source by adding to the header of train.R something like:

install.packages("myownlib.tar.gz", repos = NULL, type="source")

.... your existing code ....

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rstudio/cloudml/issues/171#issuecomment-397667966, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwN18jUQg8hlscT8tFY6-tmAnOpdomZks5t89tmgaJpZM4UoAEY .

fmannhardt commented 5 years ago

Any news on this issue? I have tried to add the package and install as suggested, but get the error message:

 Unable to retrieve package records for the following packages:
myprivatepackage

Solved the issue with activating packrat for the project myself and following this post: https://stackoverflow.com/questions/31314229/packrat-with-local-binary-repository

fmannhardt commented 5 years ago

It turned out that by using packrat for dependency management, there are more problems created than solved. When used, all the packages would be uploaded each time to the Google Cloud slowing down everything quite a bit.

I solved the issue by allowing to add the private package to a IGNORE list that was empty so far. See the pull request for details.

Z-ingdotnet commented 4 years ago

@fmannhardt sorry im running into similar issue but with common packages instead, for example CloudML failed to obtain packages lime,funModeling,latticeExtra and its dependencies returning error like

2020-02-09T09:30:49.407953977Z master-replica-0 Installing latticeExtra (0.6-29) ... I master-replica-0 2020-02-09T09:30:49.408627985Z master-replica-0 curl: (22) The requested URL returned error: 404 Not Found I master-replica-0 2020-02-09T09:30:49.408884048Z master-replica-0 curl: (22) The requested URL returned error: 404 Not Found I master-replica-0 2020-02-09T09:30:49.409107923Z master-replica-0 curl: (22) The requested URL returned error: 404 Not Found I master-replica-0 2020-02-09T09:30:49.409305094Z master-replica-0 curl: (22) The requested URL returned error: 404 Not Found I master-replica-0 2020-02-09T09:30:49.409548043Z master-replica-0 curl: (22) The requested URL returned error: 404 Not Found I master-replica-0 2020-02-09T09:30:49.409764050Z master-replica-0 FAILED I master-replica-0 2020-02-09T09:30:49.409967898Z master-replica-0 Error in getSourceForPkgRecord(pkgRecord, srcDir(project), availablePackagesSource(repos = repos), : I master-replica-0 undefined 2020-02-09T09:30:49.410171031Z master-replica-0 Failed to retrieve package sources for latticeExtra 0.6-29 from CRAN (internet connectivity issue?) I master-replica-0

I would have thought that cloudml only uploads stuff within the directory where the training script is located and will install the required R packages in the cloud. Could you please enlighten me if this is how the CloudML package works or does it actually use packrat and upload libraries from my local machine as well? Also how to resolve the common packages not accessible by CloudML in the cloud like error in the above. These are not private packages and readily available from CRAN so im stuck as to why CloudML package gives such error

many thanks