h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.93k stars 2k forks source link

s3/s3n won't work with secret key containing slash? #10307

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

My accessKey is "ABCDE" (for example) and my secretKey is "abc/def/ghj" (again, for example, but note the two forward slashes).

From Flow:

{noformat} exportFrame "ENB2012_data.hex_sid_a78f_1", "s3n://ABCDE:abc/def/ghj@mybucket/tmptest1.csv", overwrite: false {noformat}

I get "ERROR MESSAGE: Invalid hostname in URI ..." Exactly the same if I change "s3n://" to "s3://"

This Stackoverflow answer says recreating the key is the only fix: http://stackoverflow.com/a/14701375/841830 But some of the other answers give a workaround using quotes. I wonder if H2O can also find some workaround.

I gave it a go, in Flow:

{noformat} exportFrame "ENB2012_data.hex_sid_a78f_1", "s3n://\"ABCDE\":\"abc/def/ghj\"@mnist-generated/tmptest1.csv", overwrite: false

{noformat}

But that gives exactly the same error. I also tried "\" before the "/", and two backslashes; same error, both times.

I also tried the hex code for forward slash:

{noformat}

exportFrame "ENB2012_data.hex_sid_a78f_1", "s3n://ABCDE:abc%2Fdef%2Fghj@mybucket/tmptest1.csv", overwrite: false {noformat}

This gives a different error:

{noformat} ERROR MESSAGE: HDFS IO Failure: accessed URI : s3n://ABCDE:abc%2Fdef%2Fghj@mybucket/tmptest1.csv configuration: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/tmptest1.csv' - ResponseCode=403, ResponseMessage=Forbidden {noformat}

And if I try with s3, instead of s3n, a different error again:

{noformat} ERROR MESSAGE: HDFS IO Failure: accessed URI s3://ABCDE:abc%2Fdef%2Fghj@mybucket/tmptest1.csv configuration: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/%2Ftmptest1.csv' XML Error Message: <?xml version="1.0" encoding="UTF-8"?>SignatureDoesNotMatchThe request signature we calculated does not match the signature you provided. Check your key and signing method.ABCDEGETTue, 13 Sep 2016 11:53:23 GMT/mybucket/%2Ftmptest1.csv............ {noformat}

I've done the same tests from R, and get exactly the same errors.

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: You can pass the secret key using on of these methods: http://docs.aws.amazon.com/java-sdk/latest/developer-guide/credentials.html

In addition to that you can also specify a configuration option -aws_credentials and provide the location of a properties file with the credentials (set properties "accessKey" and "secretKey"). [~accountid:557058:2ceb7f2b-e7ca-465c-8e82-c046991100be] can you please confirm this actually works? Value ARGS.aws_credentials is never read anywhere (not even in H2OArgCredentialsProvider where I would naturally expect it).

exalate-issue-sync[bot] commented 1 year ago

Darren Cook commented: @Michal Kurka It is -hdfs_config not -aws-credentials ? http://docs.h2o.ai/h2o/latest-stable/h2o-docs/aws.html#standalone-instance

I have a question on that, perhaps also for @Tomas Nykodym : if I make an empty core-site.xml, and start my cluster with:

java -jar h2o.jar -hdfs_config core-site.xml

And then add/edit the credentials in core-site.xml, does it work?

More generally, is core-site.xml +read from disk+ each time an S3 connection is done, or is it only read once when H2O starts up, then cached in memory?

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-3397 Assignee: Prithvi Prabhu Reporter: Darren Cook State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A