memiiso / debezium-server-iceberg

Replicates any database (CDC events) to Apache Iceberg (To Cloud Storage)
Apache License 2.0
174 stars 35 forks source link

How to configure the use of HiveCatalog? #197

Open saimigo opened 1 year ago

saimigo commented 1 year ago

1、How to configure the use of HiveCatalog and What additional dependencies need to be added, please? 2、How to set table property "engine.hive.enabled=true" when creating tables?

error:

{"timestamp":"2023-05-30T03:14:17.889Z","sequence":100,"loggerClassName":"org.jboss.logging.Logger","loggerName":"io.quarkus.runtime.Application","level":"ERROR","message":"Failed to start application (with profile prod)","threadName":"main","threadId":1,"mdc":{},"ndc":"","hostName":"a85fccd0c6e5","processName":"io.debezium.server.Main","processId":1274,"exception":{"refId":1,"exceptionType":"java.lang.NoSuchMethodException","message":"Cannot find constructor for interface org.apache.iceberg.catalog.Catalog\n\tMissing org.apache.iceberg.hive.HiveCatalog [java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/UnknownDBException]","frames":[{"class":"org.apache.iceberg.common.DynConstructors$Builder","method":"buildChecked","line":227},{"class":"org.apache.iceberg.CatalogUtil","method":"loadCatalog","line":180},{"class":"org.apache.iceberg.CatalogUtil","method":"buildIcebergCatalog","line":234},{"class":"io.debezium.server.iceberg.IcebergChangeConsumer","method":"connect","line":128},{"class":"io.debezium.server.iceberg.IcebergChangeConsumer_Bean","method":"create"},{"class":"io.debezium.server.iceberg.IcebergChangeConsumer_Bean","method":"create"},{"class":"io.debezium.server.DebeziumServer","method":"start","line":119},{"class":"io.debezium.server.DebeziumServer_Bean","method":"create"},{"class":"io.debezium.server.DebeziumServer_Bean","method":"create"},{"class":"io.quarkus.arc.impl.AbstractSharedContext","method":"createInstanceHandle","line":111},{"class":"io.quarkus.arc.impl.AbstractSharedContext$1","method":"get","line":35},{"class":"io.quarkus.arc.impl.AbstractSharedContext$1","method":"get","line":32},{"class":"io.quarkus.arc.impl.LazyValue","method":"get","line":26},{"class":"io.quarkus.arc.impl.ComputingCache","method":"computeIfAbsent","line":69},{"class":"io.quarkus.arc.impl.AbstractSharedContext","method":"get","line":32},{"class":"io.quarkus.arc.impl.ClientProxies","method":"getApplicationScopedDelegate","line":18},{"class":"io.debezium.server.DebeziumServer_ClientProxy","method":"arc$delegate"},{"class":"io.debezium.server.DebeziumServer_ClientProxy","method":"arc_contextualInstance"},{"class":"io.debezium.server.DebeziumServer_Observer_Synthetic_d70cd75bf32ab6598217b9a64a8473d65e248c05","method":"notify"},{"class":"io.quarkus.arc.impl.EventImpl$Notifier","method":"notifyObservers","line":323},{"class":"io.quarkus.arc.impl.EventImpl$Notifier","method":"notify","line":305},{"class":"io.quarkus.arc.impl.EventImpl","method":"fire","line":73},{"class":"io.quarkus.arc.runtime.ArcRecorder","method":"fireLifecycleEvent","line":128},{"class":"io.quarkus.arc.runtime.ArcRecorder","method":"handleLifecycleEvents","line":97},{"class":"io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294","method":"deploy_0"},{"class":"io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294","method":"deploy"},{"class":"io.quarkus.runner.ApplicationImpl","method":"doStart"},{"class":"io.quarkus.runtime.Application","method":"start","line":101},{"class":"io.quarkus.runtime.ApplicationLifecycleManager","method":"run","line":103},{"class":"io.quarkus.runtime.Quarkus","method":"run","line":67},{"class":"io.quarkus.runtime.Quarkus","method":"run","line":41},{"class":"io.quarkus.runtime.Quarkus","method":"run","line":120},{"class":"io.debezium.server.Main","method":"main","line":15}]}}

ismailsimsek commented 1 year ago

@saimigo hive catalog should be configured like below. also added hive-metastore dependency to the release could you try with new release?

debezium.sink.iceberg.type=hive
debezium.sink.iceberg.uri=thrift://hadoop101:9083
debezium.sink.iceberg.clients=5
debezium.sink.iceberg.warehouse=hdfs://example.com:8020/warehouse
debezium.sink.iceberg.hive.metastore.table.owner=xyz
debezium.sink.iceberg.hive.other.configs=xyz
### 
debezium.sink.iceberg.engine.hive.enabled=true
### if above one doesn't work please try following config
debezium.sink.iceberg.iceberg.engine.hive.enabled=true
soullinuxer commented 11 months ago

Anyone tried this lately? I am not being successful trying to set up the debezium server to update metadata stored in standalone hive metastore.

I am using a config similar to the one you pasted above:
debezium.sink.iceberg.type=hive
debezium.sink.iceberg.uri=thrift://myserver:9083
debezium.sink.iceberg.clients=5
debezium.sink.iceberg.warehouse=s3a://mybucket/
debezium.sink.iceberg.table-namespace=debeziumevents
debezium.sink.iceberg.hive.metastore.table.owner=myowner
debezium.sink.iceberg.engine.hive.enabled=true
2023-09-06 08:43:02,744 DEBUG [org.apa.had.sec.UserGroupInformation] (pool-6-thread-1) Hadoop login
2023-09-06 08:43:02,745 DEBUG [org.apa.had.sec.UserGroupInformation] (pool-6-thread-1) hadoop login commit
2023-09-06 08:43:02,746 DEBUG [org.apa.had.sec.UserGroupInformation] (pool-6-thread-1) Using local user: UnixPrincipal: myuser
2023-09-06 08:43:02,746 DEBUG [org.apa.had.sec.UserGroupInformation] (pool-6-thread-1) Using user: "UnixPrincipal: myuser" with name: myuser
2023-09-06 08:43:02,747 DEBUG [org.apa.had.sec.UserGroupInformation] (pool-6-thread-1) User entry: "myuser"
2023-09-06 08:43:02,747 DEBUG [org.apa.had.sec.UserGroupInformation] (pool-6-thread-1) UGI loginUser: myuser (auth:SIMPLE)
2023-09-06 08:43:03,023 INFO  [org.apa.had.hiv.met.HiveMetaStoreClient] (pool-6-thread-1) Trying to connect to metastore with URI thrift://myserver:9083
2023-09-06 08:43:03,280 INFO  [org.apa.had.hiv.met.HiveMetaStoreClient] (pool-6-thread-1) Opened a connection to metastore, current connections: 1
2023-09-06 08:43:03,285 INFO  [org.apa.had.hiv.met.HiveMetaStoreClient] (pool-6-thread-1) Connected to metastore.
2023-09-06 08:43:03,285 INFO  [org.apa.had.hiv.met.RetryingMetaStoreClient] (pool-6-thread-1) RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=myuser (auth:SIMPLE) retries=1 delay=1 lifetime=0
2023-09-06 08:43:04,100 WARN  [org.apa.had.hiv.met.RetryingMetaStoreClient] (pool-6-thread-1) MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. getTable: org.apache.thrift.transport.TTransportException
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:380)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:230)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:2079)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:2066)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1578)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1570)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:208)
    at com.sun.proxy.$Proxy15.getTable(Unknown Source)
    at org.apache.iceberg.hive.HiveTableOperations.lambda$doRefresh$0(HiveTableOperations.java:158)
    at <unknown class>.run(Unknown Source)
    at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:58)
    at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
    at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
    at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158)
    at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
    at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
    at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
    at io.debezium.server.iceberg.IcebergUtil.loadIcebergTable(IcebergUtil.java:123)
    at io.debezium.server.iceberg.IcebergChangeConsumer.loadIcebergTable(IcebergChangeConsumer.java:188)
    at io.debezium.server.iceberg.IcebergChangeConsumer.handleBatch(IcebergChangeConsumer.java:166)
    at io.debezium.embedded.ConvertingEngineBuilder.lambda$notifying$2(ConvertingEngineBuilder.java:101)
    at <unknown class>.handleBatch(Unknown Source)
    at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:912)
    at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:229)
    at io.debezium.server.DebeziumServer.lambda$start$1(DebeziumServer.java:170)
    at <unknown class>.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:866)

and finally:

2023-09-06 08:44:12,398 INFO  [org.apa.had.hiv.met.HiveMetaStoreClient] (pool-6-thread-1) Opened a connection to metastore, current connections: 1
2023-09-06 08:44:12,424 INFO  [org.apa.had.hiv.met.HiveMetaStoreClient] (pool-6-thread-1) Connected to metastore.
2023-09-06 08:44:12,866 INFO  [io.deb.emb.EmbeddedEngine] (pool-6-thread-1) Stopping the task and engine
2023-09-06 08:44:12,878 INFO  [org.apa.kaf.con.sto.FileOffsetBackingStore] (pool-6-thread-1) Stopped FileOffsetBackingStore
2023-09-06 08:44:12,880 ERROR [io.deb.ser.ConnectorLifecycle] (pool-6-thread-1) Connector completed: success = 'false', message = 'Stopping connector after error in the application's handler method: Failed to get table info from metastore debeziumevents.debeziumcdc_account', error = 'java.lang.RuntimeException: Failed to get table info from metastore debeziumevents.debeziumcdc_account': java.lang.RuntimeException: Failed to get table info from metastore myschema.mytable
    at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:171)
    at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
    at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
    at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
    at io.debezium.server.iceberg.IcebergUtil.loadIcebergTable(IcebergUtil.java:123)
    at io.debezium.server.iceberg.IcebergChangeConsumer.loadIcebergTable(IcebergChangeConsumer.java:188)
    at io.debezium.server.iceberg.IcebergChangeConsumer.handleBatch(IcebergChangeConsumer.java:166)
    at io.debezium.embedded.ConvertingEngineBuilder.lambda$notifying$2(ConvertingEngineBuilder.java:101)
    at <unknown class>.handleBatch(Unknown Source)
    at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:912)
    at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:229)
    at io.debezium.server.DebeziumServer.lambda$start$1(DebeziumServer.java:170)
    at <unknown class>.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:866)
Caused by: org.apache.thrift.transport.TTransportException
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:380)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:230)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:2079)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:2066)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1578)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1570)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:208)
    at com.sun.proxy.$Proxy15.getTable(Unknown Source)
    at org.apache.iceberg.hive.HiveTableOperations.lambda$doRefresh$0(HiveTableOperations.java:158)
    at <unknown class>.run(Unknown Source)
    at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:58)
    at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
    at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
    at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158)
    ... 15 more

Do you have maybe some advice?