nexr / RHive

RHive is an R extension facilitating distributed computing via Apache Hive.
http://nexr.github.io/RHive
122 stars 63 forks source link

rhive.query("select * from abc limit 30000000") rhive.size.table rhive.load.table2 functions problems! #75

Closed suolemen closed 9 years ago

suolemen commented 9 years ago

first problem : 30000000 numbers data! when table is a big data how can i use function tu get the data set use rhive.query or rhive.big.query is not ok

second problem : rhive.query("select * from kc_tel") result : phoneno 1 13531542675 2 13531542297 3 13531541982 4 13531541667

but when i use : rhive.size.table("kc_tel") result : NULL
why the result is NULL ?

third problem : when rhive.load.table2("kc_tel") error : java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

who can help me ? thank you very much!

ssshow16 commented 9 years ago

Please let me know your environment values.

You can check it by using rhive function "rhive.env()"

Thanks

On Thu, Nov 6, 2014 at 12:15 PM, suolemen notifications@github.com wrote:

first problem : 30000000 numbers data! when table is a big data how can i use function tu get the data set use rhive.query or rhive.big.query is not ok

second problem : rhive.query("select * from kc_tel") result : phoneno 1 13531542675 2 13531542297 3 13531541982 4 13531541667

but when i use : rhive.size.table("kc_tel") result : NULL

why the result is NULL ?

third problem : when rhive.load.table2("kc_tel") error : java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

who can help me ? thank you very much!

— Reply to this email directly or view it on GitHub https://github.com/nexr/RHive/issues/75.

suolemen commented 9 years ago

rhive.env() hadoop home: /usr/lib/hadoop fs: hdfs://hdktmaster.infobird.com:8020 hive home: /usr/lib/hive user name: root user home: /root temp dir: /tmp/root>

ssshow16 commented 9 years ago

bug fixed and release new version "nexr-rhive-2.0.4"

Please, try again!

suolemen commented 9 years ago

I install Rhive method is : install.packages("RHive")
who to become "nexr-rhive-2.0.4"

reinstallation "RHive" packages ?

http://cran.r-project.org/ Package source: RHive_2.0-0.2.tar.gz

ssshow16 commented 9 years ago

It take a long time to register R Package into CRAN and I didn't register new version yet. So, you have to download new version from github( https://github.com/nexr/RHive).

After that, build and install RHive. There is RHive install guide in Github page.

On Thu, Nov 6, 2014 at 4:34 PM, suolemen notifications@github.com wrote:

I install Rhive method is : install.packages("RHive")

who to become "nexr-rhive-2.0.4"

— Reply to this email directly or view it on GitHub https://github.com/nexr/RHive/issues/75#issuecomment-61937419.

suolemen commented 9 years ago

I have installed nexr-rhive-2.0.4 rhive.load.table2("kc_tel") is ok!

but when i use : rhive.size.table("kc_tel") is also NULL

tableName <- "kc_tel" metaInfo <- .rhive.desc.table(tableName, detail=TRUE) location <- strsplit(strsplit(as.character(metaInfo[[1]]), "location:")[[1]][2],",")[[1]][1] location can not get right value!

ssshow16 commented 9 years ago

Which hive version do you use?

Please let me know some information from debug like the following:

debug(.rhive.size.table) rhive.size.table("kc_tel") debugging in: .rhive.size.table(tableName = tableName) debug: { if (missing(tableName)) { stop("missing tableName") } tableName <- tolower(tableName) metaInfo <- .rhive.desc.table(tableName, detail = TRUE) location <- strsplit(strsplit(as.character(metaInfo[[1]]), "location:")[[1]][2], ",")[[1]][1] dataInfo <- .rhive.hdfs.du(location, summary = TRUE) return(dataInfo$length) } Browse[2]> debug: if (missing(tableName)) { stop("missing tableName") } Browse[2]> debug: tableName <- tolower(tableName) Browse[2]> debug: metaInfo <- .rhive.desc.table(tableName, detail = TRUE) Browse[2]> debug: location <- strsplit(strsplit(as.character(metaInfo[[1]]), "location:")[[1]][2], ",")[[1]][1] Browse[2]> debug: dataInfo <- .rhive.hdfs.du(location, summary = TRUE) Browse[2]> location

Please check if location is correct.!!

suolemen commented 9 years ago

debug(.rhive.size.table) rhive.size.table("kc_tel") debugging in: .rhive.size.table(tableName = tableName) debug: { if (missing(tableName)) { stop("missing tableName") } tableName <- tolower(tableName) metaInfo <- .rhive.desc.table(tableName, detail = TRUE) location <- strsplit(strsplit(as.character(metaInfo[[1]]), "location:")[[1]][2], ",")[[1]][1] dataInfo <- .rhive.hdfs.du(location, summary = TRUE) return(dataInfo$length) } Browse[2]> debug: if (missing(tableName)) { stop("missing tableName") } Browse[2]> debug: tableName <- tolower(tableName) Browse[2]> debug: metaInfo <- .rhive.desc.table(tableName, detail = TRUE) Browse[2]> debug: location <- strsplit(strsplit(as.character(metaInfo[[1]]), "location:")[[ 1]][2], ",")[[1]][1] Browse[2]> debug: dataInfo <- .rhive.hdfs.du(location, summary = TRUE) Browse[2]> location [1] NA Browse[2]>

suolemen commented 9 years ago

hive version : hive-service-0.12.0-cdh5.0.0.jar Release Notes - Hive - Version 0.12.0

ssshow16 commented 9 years ago

What is the result for "rhive.desc.table("kc_tel",detail=TRUE)"?

suolemen commented 9 years ago

rhive.desc.table("kc_tel",detail=TRUE) X.. 1

rhive.desc.table("kc_tel",detail=FALSE) col_name data_type comment 1 phoneno string None

suolemen commented 9 years ago

Browse[2]> tableInfo <- .rhive.query(paste("DESCRIBE EXTENDED",tableName)) Browse[2]> res <- tableInfo[[2]][length(rownames(tableInfo))] Browse[2]> res [1] 3 Levels: ... Table(tableName:kc_tel, dbName:default, owner:hive, createTime:1409708239, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:phoneno, type:string, comment:null)], location:hdfs://hdktmaster.infobird.com:8020/user/hive/warehouse/kc_tel, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{colelction.delim=|, serialization.format=,, line.delim=

Browse[2]> str(res) Factor w/ 3 levels "","string ",..: 1 Browse[2]> res <- lapply(res, function(v) { gsub("(^ +)|( +$)", "", v) }) Browse[2]> res [[1]] [1] "" Browse[2]> as.data.frame(res) X.. 1

ssshow16 commented 9 years ago

I guess that your table's line.delim is '\n'. Now RHive have a bug about your case. I will fix it as soon as possible.

Until then, create table again without setting line.delim and try it.

suolemen commented 9 years ago

ok thank you

create table kc_tel (phoneno string) row format delimited fields terminated by ',' collection items terminated by '|' lines terminated by '\n' stored as textfile;

become create table kc_tel (phoneno string)
rhive.size.table("kctel") result is right !! thank you very much! --