streamnative / pulsar-io-lakehouse

pulsar lakehouse connector
Apache License 2.0
30 stars 22 forks source link

lakehouse sink connector with hive metastore enablement. #655

Open Pavan792reddy opened 3 months ago

Pavan792reddy commented 3 months ago

Hi guys, we are trying to use lake house sink connector to load the data into hudi table and now we are trying to integrate trino to read the data from trino, which requires hive meta store and those details are added into the config file as well, still we are not able to see the metadata details in the hive . can you please check and help us on the same. { "tenant": "test", "namespace": "avro", "name": "hudi-sink-bnr_dl_hive_2", "inputs": [ "persistent://test/avro/bnr_dl_avro" ], "archive": "/usr/bin/pulsar/pulsar-io-lakehouse-2.11.0-SNAPSHOT-cloud.nar", "parallelism": 1, "processingGuarantees": "EFFECTIVELY_ONCE", "configs": { "type": "hudi","maxCommitInterval": 10,"hoodie.table.name":"bnr_dl_avro_hive_2","hoodie.table.type": "MERGE_ON_READ","hoodie.base.path": "gs://test-dp-hudi/bnr_dl_hive_2","hoodie.datasource.write.recordkey.field": "id","hoodie.datasource.write.partitionpath.field":"id","hoodie.datasource.hive_sync.enable":"true","hoodie.datasource.hive_sync.database": "q6_test","hoodie.datasource.hive_sync.table": "bnr_dl_avro_hive_2","hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor","hoodie.datasource.hive_sync.use_jdbc":"false","hoodie.datasource.hive_sync.mode": "hms","hoodie.datasource.hive_sync.metastore.uris": "thrift://trino-hudiv2-m:9083","hadoop.fs.gs.impl": "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem","hadoop.fs.AbstractFileSystem.gs.impl": "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS","hadoop.google.cloud.auth.type": "SERVICE_ACCOUNT_JSON_KEYFILE","hadoop.google.cloud.auth.service.account.json.keyfile": "/home/pavankumar_reddy/key.json","hadoop.fs.gs.project.id": "q-datalake-dev","hoodie.datasource.write.hive_style_partitioning":"true","hoodie.database.name":"q_test" } }

david-streamlio commented 3 months ago

Does the data get published to Hudi at all?

Pavan792reddy commented 3 months ago

@david-streamlio we are able to load the messages into hudi table but hive Sync is not working