streamnative / pulsar-io-lakehouse

pulsar lakehouse connector
Apache License 2.0
29 stars 20 forks source link

lakehouse sink connector with hive metastore enablement. #655

Open Pavan792reddy opened 1 month ago

Pavan792reddy commented 1 month ago

Hi guys, we are trying to use lake house sink connector to load the data into hudi table and now we are trying to integrate trino to read the data from trino, which requires hive meta store and those details are added into the config file as well, still we are not able to see the metadata details in the hive . can you please check and help us on the same. { "tenant": "test", "namespace": "avro", "name": "hudi-sink-bnr_dl_hive_2", "inputs": [ "persistent://test/avro/bnr_dl_avro" ], "archive": "/usr/bin/pulsar/pulsar-io-lakehouse-2.11.0-SNAPSHOT-cloud.nar", "parallelism": 1, "processingGuarantees": "EFFECTIVELY_ONCE", "configs": { "type": "hudi","maxCommitInterval": 10,"hoodie.table.name":"bnr_dl_avro_hive_2","hoodie.table.type": "MERGE_ON_READ","hoodie.base.path": "gs://test-dp-hudi/bnr_dl_hive_2","hoodie.datasource.write.recordkey.field": "id","hoodie.datasource.write.partitionpath.field":"id","hoodie.datasource.hive_sync.enable":"true","hoodie.datasource.hive_sync.database": "q6_test","hoodie.datasource.hive_sync.table": "bnr_dl_avro_hive_2","hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor","hoodie.datasource.hive_sync.use_jdbc":"false","hoodie.datasource.hive_sync.mode": "hms","hoodie.datasource.hive_sync.metastore.uris": "thrift://trino-hudiv2-m:9083","hadoop.fs.gs.impl": "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem","hadoop.fs.AbstractFileSystem.gs.impl": "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS","hadoop.google.cloud.auth.type": "SERVICE_ACCOUNT_JSON_KEYFILE","hadoop.google.cloud.auth.service.account.json.keyfile": "/home/pavankumar_reddy/key.json","hadoop.fs.gs.project.id": "q-datalake-dev","hoodie.datasource.write.hive_style_partitioning":"true","hoodie.database.name":"q_test" } }

david-streamlio commented 1 month ago

Does the data get published to Hudi at all?

Pavan792reddy commented 3 weeks ago

@david-streamlio we are able to load the messages into hudi table but hive Sync is not working