memiiso / debezium-server-iceberg

Replicates any database (CDC events) to Apache Iceberg (To Cloud Storage)
Apache License 2.0
171 stars 35 forks source link

missing class for Hive Catalog #305

Closed pandabrowski closed 2 months ago

pandabrowski commented 2 months ago

Hi, please advice: I'm trying to run it with a hive metastore catalog. it seems like there is some unmet dependency:

config parameter: debezium.sink.iceberg.type=hive

2024-04-22 14:51:14,254 ERROR [io.qua.run.Application] (main) Failed to start application (with profile [prod]): java.lang.RuntimeException: Failed to start quarkus at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source) at io.quarkus.runtime.Application.start(Application.java:101) at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:111) at io.quarkus.runtime.Quarkus.run(Quarkus.java:71) at io.quarkus.runtime.Quarkus.run(Quarkus.java:44) at io.quarkus.runtime.Quarkus.run(Quarkus.java:124) at io.debezium.server.Main.main(Main.java:15) Caused by: java.lang.IllegalArgumentException: Cannot initialize Catalog implementation org.apache.iceberg.hive.HiveCatalog: Cannot find constructor for interface org.apache.iceberg.catalog.Catalog Missing org.apache.iceberg.hive.HiveCatalog [java.lang.ClassNotFoundException: org.apache.iceberg.hive.HiveCatalog] at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:240)

best regards, rd

pandabrowski commented 2 months ago

from the docker container content i can see, that in /app/lib - thera are iceberg 1.5 jars

ismailsimsek commented 2 months ago

What is the catalog config looks like? Do you also see iceberghive.jar libraries?

Could you try giving hive as a catalog type?

pandabrowski commented 2 months ago

Hi, as i have i mentioned above - there is a configuration parameter set to hive: debezium.sink.iceberg.type=hive

generally setup looks like this for catalog:

debezium.sink.iceberg.type=hive debezium.sink.iceberg.table-namespace=replica debezium.sink.iceberg.warehouse=s3://warehouse debezium.sink.iceberg.uri=thrift://127.0.0.1:9083 debezium.sink.iceberg.catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO debezium.sink.iceberg.engine.hive.enabled=true debezium.sink.iceberg.iceberg.engine.hive.enabled=true debezium.sink.hive.metastore.sasl.enabled=false debezium.sink.iceberg.hive.metastore.sasl.enabled=false debezium.sink.iceberg.io-impl=org.apache.iceberg.aws.s3.S3FileIO debezium.sink.iceberg.s3.endpoint=http://localhost:9000 debezium.sink.iceberg.s3.path-style-access=true debezium.sink.iceberg.s3.access-key-id=admin debezium.sink.iceberg.s3.secret-access-key=password`

I'm aiming for the setup with Hive Metastore running standalone plus minio as an S3. HMS and S3 are set up correctly, as I am able to operate it with PyIceberg.

pandabrowski commented 2 months ago

it seems that setting it up as: debezium.sink.iceberg.catalog-impl=org.apache.iceberg.hive.HiveCatalog changed something, now i get other errors regarding incorrect URI, which means that i've moved past the reported problem.

MetaException(message:Got exception: java.net.URISyntaxException Illegal character in hostname at index 22: thrift://metastore.poc_iceberg_iceberg_net:9083)

pandabrowski commented 2 months ago

the problem is caused by missing jar file:

volumes:
    - type: bind
      source: ./debezium_custom_conf/
      target: /app/conf/
    - type: bind
      source: ./iceberg-hive-metastore-1.5.1.jar
      target: /app/lib/iceberg-hive-metastore.jar

adding this jar solved the problem