Open mcvejic opened 6 years ago
@mcvejic Checking this issue now but can't reproduce. Could you share the DDL and INSERT script?
hive> show create table t9595;
OK
CREATE TABLE `t9595`(
`c1` int)
PARTITIONED BY (
`p1` varchar(8),
`p2` varchar(8))
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
presto:default> insert into hive.default.t9595 select 1, 1, 1;
INSERT: 1 row
presto:default>
presto:default> insert into hive.default.t9595 select 1, 1, 2;
INSERT: 1 row
presto:default> select * from hive.default.t9595;
c1 | p1 | p2
----+----+----
1 | 1 | 1
1 | 1 | 2
(2 rows)
presto:default> show partitions in hive.default.t9595;
p1 | p2
----+----
1 | 1
1 | 2
Any news on this? I'm having the same error every now and then.
@ordonezf , please see @ebyhr 's comment above. Could you try to simplify your case and narrow down repro steps for this issue?
Sure @findepi, sorry for the delay!
This is what I do:
A
and B
, if exists, and create them again in hiveA
into table B
using PrestoThis process runs every day and every couple of weeks the insert into table B
fails. To fix it I have to enter the hive cli and drop the tables manually.
Here is the error I get:
{'message': 'Unable to rename from s3://path.net/tmp/presto-presto/8917428b-42c2-4042-b9dc-08dd8b9a81bc/ymd=2018-04-08 to s3://path.net/emr/test/B/ymd=2018-04-08: target directory already exists', 'errorCode': 16777231, 'errorName': 'HIVE_PATH_ALREADY_EXISTS', 'errorType': 'EXTERNAL', 'failureInfo': {'type': 'com.facebook.presto.spi.PrestoException', 'message': 'Unable to rename from s3://path.net/tmp/presto-presto/8917428b-42c2-4042-b9dc-08dd8b9a81bc/ymd=2018-04-08 to s3://path.net/emr/test/B/ymd=2018-04-08: target directory already exists', 'suppressed': [], 'stack': ['com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore.renameDirectory(SemiTransactionalHiveMetastore.java:1702)', 'com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore.access$2700(SemiTransactionalHiveMetastore.java:83)', 'com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore$Committer.prepareAddPartition(SemiTransactionalHiveMetastore.java:1104)', 'com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore$Committer.access$700(SemiTransactionalHiveMetastore.java:919)', 'com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore.commitShared(SemiTransactionalHiveMetastore.java:847)', 'com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore.commit(SemiTransactionalHiveMetastore.java:769)', 'com.facebook.presto.hive.HiveMetadata.commit(HiveMetadata.java:1657)', 'com.facebook.presto.hive.HiveConnector.commit(HiveConnector.java:177)', 'com.facebook.presto.transaction.TransactionManager$TransactionMetadata$ConnectorTransactionMetadata.commit(TransactionManager.java:577)', 'java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)', 'com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)', 'com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)', 'com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)', 'io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)', 'java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)', 'java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)', 'java.lang.Thread.run(Thread.java:748)']}}
And here are the creates and inserts:
CREATE TABLE `B`(
`id` bigint,
`type` string,
`created_at` timestamp,
`status` string)
PARTITIONED BY (
`ymd` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'field.delim'='\t',
'serialization.format'='\t')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
's3://path.net/emr/test/B'
TBLPROPERTIES (
'transient_lastDdlTime'='1534133136')
This is a simplified version of the insert script:
insert into hive.test.B
with A_limited as (
select
*
from hive.test.A
where ymd >= '2018-01-01'
)
select
*
from A_limited
@ebyhr Here are the exact steps to reproduce the issue:
hive> show create table t9595;
OK
CREATE TABLE `t9595`(
`c1` int)
PARTITIONED BY (
`p1` varchar(8),
`p2` varchar(8))
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
TBLPROPERTIES ('transactional'='false')
presto:default> insert into hive.default.t9595 select 1, 1, 1;
INSERT: 1 row
presto:default>
presto:default> insert into hive.default.t9595 select 1, 1, 2;
INSERT: 1 row
presto:default> select * from hive.default.t9595;
c1 | p1 | p2
----+----+----
1 | 1 | 1
1 | 1 | 2
(2 rows)
presto:default> show partitions in hive.default.t9595;
p1 | p2
----+----
1 | 1
1 | 2
till now it works fine.. Now follow the below steps again.
hive> drop table t9595;
hive> CREATE external TABLE `t9595`(
`c1` int)
PARTITIONED BY (
`p1` varchar(8),
`p2` varchar(8))
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
TBLPROPERTIES ('transactional'='false');
presto:siq_dev> insert into t9595 select 1, '1', '1';
INSERT: 1 row
Query 20200413_091825_00078_7q573, FAILED, 3 nodes
Splits: 69 total, 69 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20200413_091825_00078_7q573 failed: Unable to rename from hdfs://siqhdp01/tmp/presto-root/e81b61f2-e69a-42e7-ad1b-47781b378554/p1=1/p2=1 to hdfs://siqhdp01/warehouse/tablespace/external/hive/siq_dev.db/t9595/p1=1/p2=1: target directory already exists
That is, if the old table (external table) is deleted and the folder(s) exists in hdfs for the table and table partitions. And when we recreate the table and try to do insert this error comes.
I am also seeing this issue as described by @mirajgodha
I'm also running into this. This seems to explain the problem as a race condition: https://translate.google.com/translate?hl=en&sl=zh-CN&u=https://www.dazhuanlan.com/2020/02/03/5e3759b8799d3/&prev=search&pto=aue
When trying to create insert into partitioned table, following error occur from time to time, making inserts unreliable.