h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.87k stars 2k forks source link

Saving model to network path not working #12507

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

We have h2o cluster on linux machine (ran through command line), and we are connecting it from our local machine (Windows) which is on the same network. When we try to call saveModel we are getting errors.

ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = http://10.0.0.4:54321/99/Models.bin/)

//Code to load model from local machine,IDE- Rstudio

ModelName <- "GLM_model_R_1522217891094_1279" modelpath <-file.path("file://dsvm-dev/Models",ModelName) Model.h2o <- h2o.loadModel(modelpath) //to save model

h2o.saveModel(object = best_model,path = "file://dsvm-dev/Models2/",force = TRUE) Please suggest any alternate ways to save model to local machine(Windows) instead of on server.

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: Hi [~accountid:557058:12993ac7-785e-49c5-ad40-6da91f3437dd], can you please share logs and attach them to this jira?

It does seem to work on Mac & Linux. This issue might be specific to Windows.

exalate-issue-sync[bot] commented 1 year ago

Sunil Ajagekar commented: @Michal Kurka Our cluster (h2o version- 3.14.0.7) is on Azure Ubuntu 16.04 machine. Ran through java command line. Step1- Connect to it from Azure windows VM. using h2o.connect(ip=private_ip, port=54321) Step2- Try to load model from windows vm to cluster. It gives error . OR when you try to save model then also it doesn't find the path.

issue1 Path- UNC format , after sharing folder to everyone. it should save the model or load the model from VM to cluster, but it does not.

issue2 We tried saving model to linux machine where cluster is running. It doesn't write to folder specified.It writes to the folder only when we change its permission of others to "rwx". Because we are not passing user details along with file path or while connecting to cluster.

exalate-issue-sync[bot] commented 1 year ago

Sunil Ajagekar commented: One question-Why h2o cluster is getting shutdown automatically? You can find logs in attached h2ologs.txt.

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: [~accountid:557058:12993ac7-785e-49c5-ad40-6da91f3437dd], thank you for the logs. Based on them it looks like it is related to this issue: PUBDEV-5686

[~accountid:5b153fb1b0d76456f36daced], assigning both to you, it should have the same fix. Thanks

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: attached logs show 2 different use-cases:

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: [~accountid:557058:12993ac7-785e-49c5-ad40-6da91f3437dd] from my comment above, can you please confirm?

exalate-issue-sync[bot] commented 1 year ago

Sunil Ajagekar commented: Mostly likely, we were trying the second scenario to load the model or save the model to network path, and we are getting error. So is it necessary to mount windows folder on Linux machine? If so , I will try mounting and test.

Also why the cluster is getting shut down after error occurred.

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: [~accountid:557058:12993ac7-785e-49c5-ad40-6da91f3437dd] correct, if your models have been previously saved on a windows machine, then you have to mount the network folder on the machines running the H2O nodes for it being able to access the files. H2O doesn't read local files (local, from a client perspective), so all files need to be accessible from the cluster.

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/save-and-load-model.html

regarding the shutdown, it occurred more than 10s after the error: looking at the logged requests, it looks like it was triggered manually. Also, if the h2o cluster was started locally from R, it will also shutdown automatically as soon as you quit your R session. Given the logs, this was a local cluster: 06-22 11:52:11.794 127.0.0.1:54321 56649 #51863-27 INFO: POST /3/Shutdown, parms: {} 06-22 11:52:13.845 127.0.0.1:54321 56649 #d-178540 INFO: Orderly shutdown: Shutting down now. so very likely that the R session was simply terminated by the user. I you happen to see the cluster shutting down automatically again, without closing the R session that started it, please let me know.

exalate-issue-sync[bot] commented 1 year ago

Sunil Ajagekar commented: Above logs are for local system. I have attached logs for remote as well in which you can see remote shutdown. We checked logs but couldn't figured out the reason, as I am not calling shutdown of the cluster. [^h2oRemoteCluster2.txt]

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: [~accountid:557058:12993ac7-785e-49c5-ad40-6da91f3437dd] can't find anything suspicious in your logs: when running client on machine with cluster on VM, I think I had the cluster shutting down once when closing the client (which is normal when starting cluster from client but should not happen if cluster is started independently), but I can't reproduce anymore: I believe it depends on how the VM shared IP with the host.

Otherwise, did you try mounting the models drive to the linux VM, did you get it working this way?

exalate-issue-sync[bot] commented 1 year ago

Sunil Ajagekar commented: I did manage to mount windows folder but while saving it is giving me below error {color:red}Error calling GET /99/Models.bin/GLM_model_R_1522217891094_1279?dir=%2Fmedia%2FData%2FModel1_GLM&force=true

ERROR MESSAGE: FS IO Failure: accessed path : file:/media/Data/Model1_GLM caused by: /media/Data/Model1_GLM (Permission denied){color}

I am trying out how to allow mounted folder access to others

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: Hi [~accountid:557058:12993ac7-785e-49c5-ad40-6da91f3437dd], can you please confirm if you were able to do your model upload after giving write permissions to the mounted drive? As your using VM, write permissions are probably set using the drive management of your VM software, or something similar.

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: Moving to next fix release, at this point it doesn't seem like there is a bug in H2O (related Windows bug PUBDEV-5686 will be fixed in 3.20.0.3 - this fix release).

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: path format issue fixed by this linked ticket. the rest of the issue is mainly related with write permissions on the VM

exalate-issue-sync[bot] commented 1 year ago

Sunil Ajagekar commented: Is there any way to save model with permissions as 0777. Because I have ran the cluster using admin user and it does save the model only with rw-r----- permissions.

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: Hi [~accountid:557058:12993ac7-785e-49c5-ad40-6da91f3437dd] I'm afraid it's not possible programmatically right now. If you're really interested in the feature, then we can create a new task for this.

For now, the only workaround I can recommend (at your own risk) is to set the umask on the shell just before starting the H2O process. Basically, if you want all newly created files to be readable+writable by everyone:

umask 011 start h2o

note that all processes started with this shell will inherit the umask, so you better not use this shell for anything else outside h2o afterwards.

exalate-issue-sync[bot] commented 1 year ago

Sunil Ajagekar commented: Yes you can create new task for it, it will definitely help us. Because my windows VM is Azure Active directory authenticated and while mounting shared folder on Linux it asks for username & pwd,and I can save file using only root access and not by other users. The things are getting complicated while saving or loading model.

exalate-issue-sync[bot] commented 1 year ago

Sunil Ajagekar commented: Hi, Now it is working with mounted drive, only thing is you have to run h2o cluster with root user.

exalate-issue-sync[bot] commented 1 year ago

Sunil Ajagekar commented: Hey, Is there any forum where I can add my queries?

Because we face lot of issues while working on h2o cluster. How can we add specific user logins to use cluster

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: Hi Sunil,

the entry point for questions is Stackoverflow, we're monitoring this tag: https://stackoverflow.com/questions/tagged/h2o.

Regarding authentication, several integrations are provided by H2O, cf. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/security.html for the various ways offered to set this up.

After discussing your previous issues internally regarding the write access, we agreed that the umask was the best solution we can offer right now for this use-case. So, I'll be closing this ticket now if you don't mind.

Thanks, Seb.

exalate-issue-sync[bot] commented 1 year ago

Sunil Ajagekar commented: Here challenge with this is umask will not be accepted by any admin of Linux machines.

I have found workaround by

solution 1- add the users to same group by which h2o cluster is running. solution 2- mount the windows folder and run the h2o cluster using root user(sudo)

**It will be great if you can provide download model option to save and browse option to load model

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: [~accountid:557058:12993ac7-785e-49c5-ad40-6da91f3437dd], we have options to download POJO and MOJO. We don't provide a way to download the binary model. It would make sense to add that capability, please feel free to make a new JIRA for this feature request.

I will close this jira as the original issue is resolved.

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-5648 Assignee: Sebastien Poirier Reporter: Sunil Ajagekar State: Resolved Fix Version: 3.20.0.4 Attachments: Available (Count: 4) Development PRs: N/A

Attachments From Jira

Attachment Name: h2oLoadModel.txt Attached By: Sunil Ajagekar File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5648/h2oLoadModel.txt

Attachment Name: h2ologs.txt Attached By: Sunil Ajagekar File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5648/h2ologs.txt

Attachment Name: h2oRemoteCluster.txt Attached By: Sunil Ajagekar File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5648/h2oRemoteCluster.txt

Attachment Name: h2oRemoteCluster2.txt Attached By: Sunil Ajagekar File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5648/h2oRemoteCluster2.txt