Closed rollingdeep closed 1 year ago
I had a similar issue using the Hive Warehouse Connector (also version 1.0.0.3.1.0.0-78).
For me, the problem was solved when I used DB and table name without underscore characters. So I changed my db from test_db to testdb and test_table to testtable.
I had a similar issue using the Hive Warehouse Connector (also version 1.0.0.3.1.0.0-78).
For me, the problem was solved when I used DB and table name without underscore characters. So I changed my db from test_db to testdb and test_table to testtable.
Maybe you are right. Now I am using hive external table and inserting data into the table from hdfs.
def save2hive(hive: HiveWarehouseSession, df: DataFrame, database: String, tableName: String, pt: Map[String, String]): Unit = {
var path = s"/warehouse/xxx/$database.db/$tableName"
var pt_str = ""
if (pt.nonEmpty) {
var pt_info = List[String]()
for((k,v) <- pt) { //TODO Ordered Map is need
path = path + s"/$k=$v"
pt_info = pt_info :+ s"$k='$v'"
}
pt_str = pt_info.mkString(",")
}
df.write.mode(SaveMode.Overwrite).format("orc").save(path)
if (pt.nonEmpty) {
hive.executeUpdate(s"ALTER TABLE $database.$tableName DROP IF EXISTS PARTITION ($pt_str)")
hive.executeUpdate(s"ALTER TABLE $database.$tableName ADD IF NOT EXISTS PARTITION ($pt_str) LOCATION '$path'")
} else {
hive.executeUpdate(s"ALTER TABLE $database.$tableName SET LOCATION '$path'")
}
println("success!")
}
I'm having the same issue and renaming the table doesn't fix it either. What I've found is that it only occurs if you try to use a partitionBy
at writing, like:
df.write.partitionBy("part").mode(SaveMode.Overwrite).format(com.hortonworks.hwc.HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR).option("table", "`default`.`testout`").save;
On an other note, if you remove the partitionBy
piece, partitioning works as expected (as partition info is already stored in the Hive table), but if you use overwrite mode (and not, for example, append), HWC will drop and recreate your table and it won't reapply partitioning info.
I'm having the same issue and renaming the table doesn't fix it either. What I've found is that it only occurs if you try to use a
partitionBy
at writing, like:df.write.partitionBy("part").mode(SaveMode.Overwrite).format(com.hortonworks.hwc.HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR).option("table", "`default`.`testout`").save;
On an other note, if you remove the
partitionBy
piece, partitioning works as expected (as partition info is already stored in the Hive table), but if you use overwrite mode (and not, for example, append), HWC will drop and recreate your table and it won't reapply partitioning info.
The problem was submitted about one year ago! But the maintainer never reply a word! you can try the external table mode which I answered above. If your table has only one partition, "save2hive" function will perfectly cover your demand.
Indeed, it seems there are many issues with this project. Our client's use-case requires us to use managed ACID tables, so we need to stick with that.
Finally my solution, and it seems to work, hope it can be helpful for some of you (it manages some of the bugs I found):
partitionBy
at write time
My Goal
I want use external table and append data without drop because our hdp3 require us to specify the location of table.
Question
I describe the table. And it shows table structure correctly and exsist in hive. But, DefaultJDBCWrapper tried to create new table because "tableExists:false". I'm using hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar which was installed with ambari.
log shows like this:
I think this is a bug.
my source code: