Closed exalate-issue-sync[bot] closed 1 year ago
Lauren DiPerna commented: issue is posted on StackOverFlow [here |https://stackoverflow.com/questions/55182284/save-a-h2o-ai-model-to-s3-bucket-in-python]
Pavel Pscheidl commented: Currently, this is not supported.
PersistS3 Class, line 263.
{code:java} // Store Value v to disk. @Override public void store(Value v) { if( !v._key.home() ) return; throw H2O.unimpl(); // VA only } {code}
Pavel Pscheidl commented: S3A supports it.
PersistHDFS class: {code:java} @Override public void store(Value v) { // Should be used only if ice goes to HDFS assert this == H2O.getPM().getIce(); assert !v.isPersisted();
byte[] m = v.memOrLoad();
assert (m == null || m.length == v._max); // Assert not saving partial files
store(new Path(_iceRoot, getIceName(v)), m);
} {code}
Michal Kurka commented: reclassified to an improvement, minor priority - preferred way is to use S3A/S3N (on EMR)
Prabhu Subramanian commented: Hi All,
Is this also applicable for the below export?
{code:python}h2o.export_file(data_frame ,path='s3a://…..'){code}
Michal Kurka commented: [~accountid:5b9be0a796cb052b5f65d3a5] yes, the same applies to all export functions - you need to use “s3a” for your exports
Prabhu Subramanian commented: Hi Michal,
I know this might not be related to this ticket, but I needed some help in understanding the error I am trying to look into, which is related to this ticket. I would really appreciate it if you can help me with the error below which is in a way related to the ticket.
{code:python}h2o.export_file(data_frame ,path='s3a://bucket_name/path/dataset.csv'){code}
Error below:
{code:python}H2OServerError: HTTP 500 Server Error:
Server error water.api.HDFSIOException:
Error: HDFS IO Failure:
accessed URI : s3://com.squarkai.seer.develop.project-8/test/Churn_Train.csv
configuration: Configuration: core-default.xml, core-site.xml, hdfs-default.xml, hdfs-site.xml, /Users/prabhusubramanian/Desktop/F Folder/RA Squark/h2o-3.32.0.2/core-site.xml
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?>InvalidAccessKeyId
Michal Kurka commented: this looks like you provided invalid AWS access key id, can you make sure it is correct?
Prabhu Subramanian commented: Hi Michal,
Credentials provided through the XML file actually works for {{h2o.import_file('s3://…')}}
But not for the export statements, even with the {{s3a}} or {{s3n}}. I tried all the possibilities, but no success with the correct credentials provided. I am sure the credentials are right, because of the import statements working well, but not the export statements.
Kunal Mishra commented: I’ll throw a +1 in for implementing saving to S3 natively! As it is, I’ll probably save locally and use the R package {{aws.s3}} to work around the limitation, for anyone else looking for alternative solutions.
Michal Kurka commented: [~accountid:5cc0b0886fbf5a10040d2945] thanks for the input, I think it would be a great change to add
Kunal Mishra commented: Yup. Leaving an implementation here for anybody who comes through looking for the same thing!
{code:r}save_h2o_model_to_s3 <- function(h2o_model, s3_path, save_type = 'model', local_save_dir = tempdir(), keep_local = FALSE, show_progress = TRUE, force = TRUE) {
#' @param h2o_model: a reference to the H2O model that needs to be saved
#' @param s3_path: a string containing the name the object should have in S3 (i.e., its "object key" or its intended S3 URI), as supplied to aws.s3::put_object()
#' @param save_type: a string, indicating which h2o.save function to use, between 'model', 'mojo', and 'model_details'
#' @param local_save_dir: An absolute path to the directory in which h2o_model will be saved
#' @param keep_local: Whether or not the local version of the saved h2o_model should be deleted after being pushed to S3
#' @param show_progress: A logical indicating whether to show a progress bar for uploads. Default is given by options("verbose").
#' @param force: A logical, indicating whether to overwrite files that already exist.
#' @Returns: The h2o_model, invisibly
if (save_type == 'model') {
local_save_path <- h2o::h2o.saveModel(object = h2o_model, path = local_save_dir, force = force)
} else if (save_type == 'mojo') {
local_save_path <- h2o::h2o.save_mojo(object = h2o_model, path = local_save_dir, force = force)
} else if (save_type == 'model_details') {
local_save_path <- h2o::h2o.saveModelDetails(object = h2o_model, path = local_save_dir, force = force)
} else {
assertthat::assert_that(FALSE, msg = 'Unsupported save_type passed to save_h2o_model_to_s3(). Supported types are limited to "model", "model_details", and "mojo"')
}
aws.s3::put_object(
file = local_save_path,
object = s3_path,
multipart = T
)
if (!keep_local) {
suppressWarnings(file.remove(local_save_path))
}
return(invisible(h2o_model))
}{code}
Prabhu Subramanian commented: Should we expect this fix in the upcoming version? Has this been fixed? or ignored?
Michal Kurka commented: [~accountid:5b9be0a796cb052b5f65d3a5] resolved as “fixed”, meaning the code change was implemented and the target release will have this feature working
Fix version was set to 3.34.0.1 which is H2O’s next major release you can expect in 1-2 months.
Michal Kurka commented: [~accountid:5b9be0a796cb052b5f65d3a5] you are welcome to try this feature in our nightly builds
Please keep in mind I just resolved the ticket today and the current nightly will not have it yet. It should appear there after a day or 2.
Prabhu Subramanian commented: Thank you very much, Michal! Looking forward to it. Appreciate your updates.
JIRA Issue Migration Info
Jira Issue: PUBDEV-6364 Assignee: Michal Kurka Reporter: Reyhaneh Esmaielbeiki State: Resolved Fix Version: 3.34.0.1 Attachments: N/A Development PRs: Available
Linked PRs from JIRA
I have been using the command below to save my h2O model into a s3 bucket in python3 (I am using amazon EMR):
h2o.save_model(model=best_gbm1,path='s3://bucketname/folder1/folder2', force=False) but I do get the following error:
H2OServerError: HTTP 500 Server Error: Server error java.lang.RuntimeException: Error: Not implemented Request: None
is it possible to save a H2O model directly to a S3 bucket