Open torvalds-dev-testbot[bot] opened 3 months ago
Hello! 👋 Please wait while I generate the follow up questions; this usually takes about 30 seconds
Follow up questions:
Sources:
Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/247
Follow up questions:
Sources:
Hello! :wave: Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds
To run both Hudi and Delta formats on the same Spark 3.x cluster without issues with dependent jars, you can follow these steps:
Make sure to disable Spark's dynamic allocation feature by setting "spark.dynamicAllocation.enabled": "false"
in the Spark configuration. (source: https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-transparent-tpc-ds-lakehouse-performance-benchmarks)
Configure the Hudi table creation to use Hive style partitioning by adding the parameter hoodie.datasource.write.hive_style_partitioning: true
. This will create partition folders similar to Hive's structure. (source: https://hudi.apache.org/docs/faq_writing_tables)
Register your Hudi table with the Apache Hive metastore for seamless integration using either the standalone Hive Sync tool or Hudi Streamer tool or datasource options. (source: https://hudi.apache.org/docs/faq_integrations)
By following the above steps, you can successfully run both Hudi and Delta formats on the same Spark 3.x cluster to ensure smooth operations and avoid any jar dependency issues. :large_blue_square: Confidence Score: 92.94% :loudspeaker: Publish Answer button >torvalds.dev is learning and improving. React below to provide feedback! :+1: 0 button :-1: 0 button
To run both Hudi and Delta formats on the same Spark 3.x cluster without issues with dependent jars, you can follow these steps:
Make sure to disable Spark's dynamic allocation feature by setting "spark.dynamicAllocation.enabled": "false"
in the Spark configuration. (source: https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-transparent-tpc-ds-lakehouse-performance-benchmarks)
Configure the Hudi table creation to use Hive style partitioning by adding the parameter hoodie.datasource.write.hive_style_partitioning: true
. This will create partition folders similar to Hive's structure. (source: https://hudi.apache.org/docs/faq_writing_tables)
Register your Hudi table with the Apache Hive metastore for seamless integration using either the standalone Hive Sync tool or Hudi Streamer tool or datasource options. (source: https://hudi.apache.org/docs/faq_integrations)
By following the above steps, you can successfully run both Hudi and Delta formats on the same Spark 3.x cluster to ensure smooth operations and avoid any jar dependency issues.
Tips before filing an issue
Describe the problem you faced Is there any documentation available for running both Hudi and Delta formats on the same Spark 3.x cluster to avoid issues with dependent jars?
To Reproduce
Expected behavior
Environment Description
Additional context Add any other context about the problem here.
Stacktrace Add the stacktrace of the error.