Open numberlabs-developers opened 10 months ago
torvalds.dev is analyzing the ticket
torvalds.dev is analyzing the ticket
The user is asking about the priority of parameter settings in Apache Hudi when using Spark on Hive. The parameters in question are:
tblproperties
sql / data source set
hudi-defaults.conf
The user's environment includes Hudi version 0.12.1, Spark version 3.1.3, Hive version 3.1.0, Hadoop version 3.1.1, and storage on HDFS. The user is not running on Docker.
The user also provided the content of a file named hbase-site.xml
located in hudi-common/src/main/resources/
. This file contains configuration settings for HBase, which is a part of the Hadoop ecosystem like Hive.
Plan:
Understand the issue: The user wants to know the priority of the three parameter settings in Hudi. This is likely related to how Hudi applies these settings when it runs. The priority could affect the final configuration of a Hudi job, especially if there are conflicting settings in these parameters.
Suggest actions: The user should check the Hudi documentation or source code to understand how it applies these settings. If the documentation is not clear, the user could also try setting conflicting parameters and see which one takes effect.
Identify relevant files: The hbase-site.xml
file provided by the user is likely relevant to the issue. This file contains HBase settings, which could affect Hudi's operation if Hudi uses HBase. Other potentially relevant files are the hudi-defaults.conf
file mentioned by the user, and any files where tblproperties
and sql / data source set
parameters are set.
Using Spark On Hive.
What is the priority for the following three parameter settings to take effect ?
1、tblproperties 2、sql / data source set 3、hudi-defaults.conf
Environment Description
Hudi version : 0.12.1
Spark version : 3.1.3
Hive version : 3.1.0
Hadoop version : 3.1.1
Storage (HDFS/S3/GCS..) : HDFS
Running on Docker? (yes/no) : no