nexr / RHive

RHive is an R extension facilitating distributed computing via Apache Hive.
http://nexr.github.io/RHive
122 stars 63 forks source link

rhive can not display timestamp and chinese code correctly #84

Closed winghc closed 9 years ago

winghc commented 9 years ago

compile env: Hive 0.13.1, hadoop 2.5.2

HIVE RESULT

hive>select id,tstype,req_replaytime,tscontent_ from dw_ods.sample_cc_ts where partitionstart='2015-01-31' limit 3

0 1070653 04 2015-02-02 23:36:00.0 表示其12份有参加我的活动, 1 1070637 09 NULL 针对我在线客服服务问题 2 1050530 01 2015-01-13 19:44:00.0 客原单位名称

RHIVE RESULT:

rhive.query("select id,tstype,req_replaytime,tscontent_ from dw_ods.sample_cc_ts where partitionstart='2015-01-31' limit 3");

id tstype req_replaytime tscontent_ 1 1070653 4 NA NA 2 1070637 9 NA NA 3 1050530 1 NA NA Warning messages: 1: NAs introduced by coercion 2: NAs introduced by coercion 3: NAs introduced by coercion 4: NAs introduced by coercion 5: NAs introduced by coercion

Where is the problem? or can you guide me how to fix this? Thanks

winghc commented 9 years ago

problem locate at rhive.query function in rhive.R, where just support hive datatype "string". Since hive 0.12, there are other type like Varchar.... I will fix it and commit later on

winghc commented 9 years ago

below is patch to fix it.

git diff HEAD^ RHive/R/rhive.R

diff --git a/RHive/R/rhive.R b/RHive/R/rhive.R
index e54f7bf..15e5fe1 100644
--- a/RHive/R/rhive.R
+++ b/RHive/R/rhive.R
@@ -352,7 +352,7 @@
     for (i in seq.int(length(colTypes))) {
       colType <- colTypes[i]
       colName <- colNames[i]
-      if (colType == "string") {
+      if (  any(colType %in% c("string","varchar","timestamp","date","char")) )  {
         lst[[i]] <- character()
       } else if (length(grep("^array", colType)) > 0) {
         lst[[i]] <- character()