nexr / RHive

RHive is an R extension facilitating distributed computing via Apache Hive.
http://nexr.github.io/RHive
122 stars 63 forks source link

Error: java.io.IOException: Mkdirs failed to create file:/rhive/lib/2.0-0.0 #62

Closed Ironholds closed 9 years ago

Ironholds commented 10 years ago

What it says on the tin.

rhive.init(hiveHome = "usr/lib/hive/", hadoopHome = "/usr/lib/hadoop/") rhive.connect(host="analytics1027.eqiad.wmnet",port=10000, hiveServer2=TRUE, defaultFS=NULL, updateJar=FALSE, user=NULL, password=NULL) 14/07/23 17:17:46 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 14/07/23 17:17:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Error: java.io.IOException: Mkdirs failed to create file:/rhive/lib/2.0-0.0

ssshow16 commented 10 years ago

Hi Oliver.

Please check the location for Hadoop or Hadoop Configuration.

RHive failed to load Hadoop Configuration. After that, RHive had a access to local file system and tried to make the rhive directory like the following: file:/rhive/lib/2.0-0.0

RHive have to make a directory for RHive on HDFS.

After checking, try again please.

Thanks. If there is any problem again, feel free to contact us.

On Thu, Jul 24, 2014 at 2:19 AM, Oliver Keyes notifications@github.com wrote:

tw

— Reply to this email directly or view it on GitHub https://github.com/nexr/RHive/issues/62#issuecomment-49905791.

Ironholds commented 9 years ago

By that do you mean the HadoopConf parameter?

ssshow16 commented 9 years ago

Yes!

You need to pass the HadoopConf parameter unless you set HADOOP_CONF_DIR Environment variable.

On Sun, Jul 27, 2014 at 10:14 PM, Oliver Keyes notifications@github.com wrote:

By that do you mean the HadoopConf parameter?

— Reply to this email directly or view it on GitHub https://github.com/nexr/RHive/issues/62#issuecomment-50264094.

Ironholds commented 9 years ago

Yep; that error has now gone away. Yay! Now stuck on:

?rhive.init rhive.init(hiveHome = "usr/lib/hive/", hadoopHome = "/usr/lib/hadoop/", hadoopLib = "/usr/lib/hadoop/lib/", hadoopConf = "/usr/lib/hadoop/etc/hadoop") rhive.connect(host="analytics1027.eqiad.wmnet",port=10000, hiveServer2=TRUE, defaultFS=NULL,

  • updateJar=FALSE, user=NULL, password=NULL) Error: org.apache.hadoop.security.AccessControlException: Permission denied: user=ironholds, access=WRITE, inode="/":hdfs:hadoop:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5490) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5472) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5446) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2274) at org.apache.hadoop.hdfs.server.namenode.FSNamesy
ssshow16 commented 9 years ago

HDFS default directory for RHive is "/rhive". If RHive default dir. don't exist, RHive will make it. After that, RHive continue to use it. However, if "rhive" directory don't have the 755 permission, AccessControlException can occur like your case. If you don't want to use default dir., you can change it by changing R system environment value like the following: You must change it before load RHive library or executing rhive.init() Sys.setenv(RHIVE_FS_HOME="/rhive")

On Mon, Jul 28, 2014 at 4:38 PM, Oliver Keyes notifications@github.com wrote:

Yep; that error has now gone away. Yay! Now stuck on:

?rhive.init rhive.init(hiveHome = "usr/lib/hive/", hadoopHome = "/usr/lib/hadoop/", hadoopLib = "/usr/lib/hadoop/lib/", hadoopConf = "/usr/lib/hadoop/etc/hadoop")

rhive.connect(host="analytics1027.eqiad.wmnet",port=10000, hiveServer2=TRUE, defaultFS=NULL,

  • updateJar=FALSE, user=NULL, password=NULL) Error: org.apache.hadoop.security.AccessControlException: Permission denied: user=ironholds, access=WRITE, inode="/":hdfs:hadoop:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5490) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5472) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5446) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2274) at org.apache.hadoop.hdfs.server.namenode.FSNamesy

    — Reply to this email directly or view it on GitHub https://github.com/nexr/RHive/issues/62#issuecomment-50308208.

Ironholds commented 9 years ago

Gotcha. Would it not make sense to deliberately allow a user to specify the directory to treat as a home directory, to allow for per-user configurations?

Ironholds commented 9 years ago

Having tried this I'm getting the same error, with /home/[username]/rhive; is it creating these on the individual worker nodes or simply on the client machine?

ssshow16 commented 9 years ago

You need to make the directory for RHive on HDFS.

On Mon, Jul 28, 2014 at 5:55 PM, Oliver Keyes notifications@github.com wrote:

Having tried this I'm getting the same error, with /home/[username]/rhive; is it creating these on the individual worker nodes or simply on the client machine?

— Reply to this email directly or view it on GitHub https://github.com/nexr/RHive/issues/62#issuecomment-50313855.

Ironholds commented 9 years ago

Well, that explains the problem, then :/. Hmn. I'll look at it (although I can't actually find RHIVE_FS_HOME being used anywhere in this package)

ssshow16 commented 9 years ago

RHive uses RHIVE_FS_HOME for several purpose. For example, when calling rhive.connect(), upload jar files for R UDF/UDAF on RHIVE_FS_HOME/lib. After that, when calling R UDF function, each Worker node will download these jars from HDFS and use it.

The value of RHIVE_FS_HOME is stored as the different name ' FS_HOME' in namespace in RHive package.

On Mon, Jul 28, 2014 at 6:04 PM, Oliver Keyes notifications@github.com wrote:

Well, that explains the problem, then :/. Hmn. I'll look at it (although I can't actually find RHIVE_FS_HOME being used anywhere in this package)

— Reply to this email directly or view it on GitHub https://github.com/nexr/RHive/issues/62#issuecomment-50314691.

Ironholds commented 9 years ago

where?

ssshow16 commented 9 years ago

65 line, rhive.R

On Mon, Jul 28, 2014 at 6:43 PM, Oliver Keyes notifications@github.com wrote:

where? https://github.com/nexr/RHive/search?q=FS_HOME&ref=cmdform

— Reply to this email directly or view it on GitHub https://github.com/nexr/RHive/issues/62#issuecomment-50318571.

Ironholds commented 9 years ago

Huh; weird. Github's searching fails!

Anyway, the initial issue is resolved, so I'll close this down. I've uploaded a related patch to try to add some clarity to the man pages surrounding hive.init() and hive.connect().

ouzor commented 9 years ago

Hi, I still have the original problem that Ironholds had, setting hadoopConf did not help:

library("RHive")
rhive.init(hiveHome = "usr/local/hive/", hadoopHome = "/usr/local/hadoop/", hadoopLib = "/usr/local/hadoop/lib/", hadoopConf = "/usr/local/hadoop/etc/hadoop/")
rhive.connect()

2014-08-12 05:51:03,682 INFO Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - fs.default.name is deprecated. Instead, use fs.defaultFS SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2014-08-12 05:51:05,249 WARN util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Error: java.io.IOException: Mkdirs failed to create file:/rhive/lib/2.0-0.0

prabhunkl commented 9 years ago

It sounds to me the same problem.

You need to set RHIVE_FS_HOME variable like bellow.

Set.env(RHIVE_FS_HOME ="/user/home/rhive")

Make sure you have write privilege to the HDFS folder.

Sent from my iPhone

On Aug 12, 2014, at 1:54 AM, Juuso Parkkinen notifications@github.com wrote:

Hi, I still have the original problem that Ironholds had, setting hadoopConf did not help:

library("RHive") rhive.init(hiveHome = "usr/local/hive/", hadoopHome = "/usr/local/hadoop/", hadoopLib = "/usr/local/hadoop/lib/", hadoopConf = "/usr/local/hadoop/etc/hadoop/") rhive.connect() 2014-08-12 05:51:03,682 INFO Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - fs.default.name is deprecated. Instead, use fs.defaultFS SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2014-08-12 05:51:05,249 WARN util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Error: java.io.IOException: Mkdirs failed to create file:/rhive/lib/2.0-0.0

— Reply to this email directly or view it on GitHub.

ouzor commented 9 years ago

I tried

Sys.setenv("RHIVE_FS_HOME"="/user/ubuntu/rhive/")

But still get

rhive.connect()
Error: java.io.IOException: Mkdirs failed to create file:/rhive/lib/2.0-0.0

Whatever I set RHIVE_FS_HOME to, I get the same error. I also set the permissions to that folder to 777.

Ironholds commented 9 years ago

Are you setting it before, rather than after, you load RHive and instantiate the connection?

And iirc the directory is created on the nodes, not just the client, so simply cmodding the permissions locally may not do anything (emphasis on the iirc, though).

ouzor commented 9 years ago

Before, of course :) So, once again, first on command line:

hadoop fs -ls
drwxrwxrwx   - ubuntu hadoop          0 2014-08-13 12:10 rhive

So there's a folder in HDFS in /user/ubunut/rhive with full permissions. Then in R

 library("RHive")
 rhive.init(hiveHome = "usr/local/hive/", hadoopHome = "/usr/local/hadoop/", hadoopLib = "/usr/local/hadoop/lib/", hadoopConf = "/usr/local/hadoop/etc/hadoop/")
 Sys.setenv("HADOOP_CONF_DIR"="/usr/local/hadoop/etc/hadoop")
 Sys.setenv("RHIVE_FS_HOME"="/user/ubuntu/rhive")
 rhive.connect()

Gives a bunch of messages (already listed in In my first message), but the last line is

Error: java.io.IOException: Mkdirs failed to create file:/rhive/lib/2.0-0.0

Ironholds: Can you clarify your last sentence, is not setting the permissions on HDFS not enough?

Ironholds commented 9 years ago

Ah, I see the confusion; when I said you need to do it before you load RHive, I meant set the environmental variables. My bad; I should've been clearer!

ouzor commented 9 years ago

Ah, thanks! Now I made progress, as the error message now points to the RHIVE_FS_HOME directory:

Error: java.io.IOException: Mkdirs failed to create file:/user/ubuntu/rhive/lib/2.0-0.0

I don't get this, as folders ubuntu, rhive, and lib have 777 permissions.

ouzor commented 9 years ago

Umm, it seems RHIVE_FS_HOME does NOT point to HDFS, but the local machine I am using R in, after all. Setting it to /user/ubuntu/ got me forward. Based on the stuff above I interpreted that we are talking about HDFS.... But anyway, thanks for helping!