nexr / RHive

RHive is an R extension facilitating distributed computing via Apache Hive.
122 stars 63 forks source link

rhive.query("select * from abc limit 30000000") rhive.size.table rhive.load.table2 functions problems! #75

Closed suolemen closed 9 years ago

suolemen commented 9 years ago

first problem : 30000000 numbers data! when table is a big data how can i use function tu get the data set use rhive.query or rhive.big.query is not ok

second problem : rhive.query("select * from kc_tel") result : phoneno 1 13531542675 2 13531542297 3 13531541982 4 13531541667

but when i use : rhive.size.table("kc_tel") result : NULL
why the result is NULL ?

third problem : when rhive.load.table2("kc_tel") error : java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

who can help me ? thank you very much!

ssshow16 commented 9 years ago

Please let me know your environment values.

You can check it by using rhive function "rhive.env()"


On Thu, Nov 6, 2014 at 12:15 PM, suolemen wrote:

first problem : 30000000 numbers data! when table is a big data how can i use function tu get the data set use rhive.query or rhive.big.query is not ok

second problem : rhive.query("select * from kc_tel") result : phoneno 1 13531542675 2 13531542297 3 13531541982 4 13531541667

but when i use : rhive.size.table("kc_tel") result : NULL

why the result is NULL ?

third problem : when rhive.load.table2("kc_tel") error : java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

who can help me ? thank you very much!

— Reply to this email directly or view it on GitHub

suolemen commented 9 years ago

rhive.env() hadoop home: /usr/lib/hadoop fs: hdfs:// hive home: /usr/lib/hive user name: root user home: /root temp dir: /tmp/root>

ssshow16 commented 9 years ago

bug fixed and release new version "nexr-rhive-2.0.4"

Please, try again!

suolemen commented 9 years ago

I install Rhive method is : install.packages("RHive")
who to become "nexr-rhive-2.0.4"

reinstallation "RHive" packages ? Package source: RHive_2.0-0.2.tar.gz

ssshow16 commented 9 years ago

It take a long time to register R Package into CRAN and I didn't register new version yet. So, you have to download new version from github(

After that, build and install RHive. There is RHive install guide in Github page.

On Thu, Nov 6, 2014 at 4:34 PM, suolemen wrote:

I install Rhive method is : install.packages("RHive")

who to become "nexr-rhive-2.0.4"

— Reply to this email directly or view it on GitHub

suolemen commented 9 years ago

I have installed nexr-rhive-2.0.4 rhive.load.table2("kc_tel") is ok!

but when i use : rhive.size.table("kc_tel") is also NULL

tableName <- "kc_tel" metaInfo <- .rhive.desc.table(tableName, detail=TRUE) location <- strsplit(strsplit(as.character(metaInfo[[1]]), "location:")[[1]][2],",")[[1]][1] location can not get right value!

ssshow16 commented 9 years ago

Which hive version do you use?

Please let me know some information from debug like the following:

debug(.rhive.size.table) rhive.size.table("kc_tel") debugging in: .rhive.size.table(tableName = tableName) debug: { if (missing(tableName)) { stop("missing tableName") } tableName <- tolower(tableName) metaInfo <- .rhive.desc.table(tableName, detail = TRUE) location <- strsplit(strsplit(as.character(metaInfo[[1]]), "location:")[[1]][2], ",")[[1]][1] dataInfo <- .rhive.hdfs.du(location, summary = TRUE) return(dataInfo$length) } Browse[2]> debug: if (missing(tableName)) { stop("missing tableName") } Browse[2]> debug: tableName <- tolower(tableName) Browse[2]> debug: metaInfo <- .rhive.desc.table(tableName, detail = TRUE) Browse[2]> debug: location <- strsplit(strsplit(as.character(metaInfo[[1]]), "location:")[[1]][2], ",")[[1]][1] Browse[2]> debug: dataInfo <- .rhive.hdfs.du(location, summary = TRUE) Browse[2]> location

Please check if location is correct.!!

suolemen commented 9 years ago

debug(.rhive.size.table) rhive.size.table("kc_tel") debugging in: .rhive.size.table(tableName = tableName) debug: { if (missing(tableName)) { stop("missing tableName") } tableName <- tolower(tableName) metaInfo <- .rhive.desc.table(tableName, detail = TRUE) location <- strsplit(strsplit(as.character(metaInfo[[1]]), "location:")[[1]][2], ",")[[1]][1] dataInfo <- .rhive.hdfs.du(location, summary = TRUE) return(dataInfo$length) } Browse[2]> debug: if (missing(tableName)) { stop("missing tableName") } Browse[2]> debug: tableName <- tolower(tableName) Browse[2]> debug: metaInfo <- .rhive.desc.table(tableName, detail = TRUE) Browse[2]> debug: location <- strsplit(strsplit(as.character(metaInfo[[1]]), "location:")[[ 1]][2], ",")[[1]][1] Browse[2]> debug: dataInfo <- .rhive.hdfs.du(location, summary = TRUE) Browse[2]> location [1] NA Browse[2]>

suolemen commented 9 years ago

hive version : hive-service-0.12.0-cdh5.0.0.jar Release Notes - Hive - Version 0.12.0

ssshow16 commented 9 years ago

What is the result for "rhive.desc.table("kc_tel",detail=TRUE)"?

suolemen commented 9 years ago

rhive.desc.table("kc_tel",detail=TRUE) X.. 1

rhive.desc.table("kc_tel",detail=FALSE) col_name data_type comment 1 phoneno string None

suolemen commented 9 years ago

Browse[2]> tableInfo <- .rhive.query(paste("DESCRIBE EXTENDED",tableName)) Browse[2]> res <- tableInfo[[2]][length(rownames(tableInfo))] Browse[2]> res [1] 3 Levels: ... Table(tableName:kc_tel, dbName:default, owner:hive, createTime:1409708239, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:phoneno, type:string, comment:null)], location:hdfs://, inputFormat:org.apache.hadoop.mapred.TextInputFormat,, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{colelction.delim=|, serialization.format=,, line.delim=

Browse[2]> str(res) Factor w/ 3 levels "","string ",..: 1 Browse[2]> res <- lapply(res, function(v) { gsub("(^ +)|( +$)", "", v) }) Browse[2]> res [[1]] [1] "" Browse[2]> X.. 1

ssshow16 commented 9 years ago

I guess that your table's line.delim is '\n'. Now RHive have a bug about your case. I will fix it as soon as possible.

Until then, create table again without setting line.delim and try it.

suolemen commented 9 years ago

ok thank you

create table kc_tel (phoneno string) row format delimited fields terminated by ',' collection items terminated by '|' lines terminated by '\n' stored as textfile;

become create table kc_tel (phoneno string)
rhive.size.table("kctel") result is right !! thank you very much! --