Open torvalds-dev-testbot[bot] opened 10 months ago
torvalds.dev is analyzing the ticket
It seems like you're facing a ClassNotFoundException
for org.apache.hadoop.fs.s3a.S3AFileSystem
. This class is part of the hadoop-aws
package, which is used to interact with AWS S3.
The error indicates that the required JAR files are not available in the classpath during the runtime of your Spark job. You have already included hadoop-aws
and aws-java-sdk
in your spark-submit
command, but it seems like Hive is not able to find them.
Here is a plan to resolve this issue:
Verify the paths of aws-java-sdk
and hadoop-aws
JAR files: Make sure that the paths /opt/hive/lib/aws-java-sdk-1.11.30.jar
and /opt/hive/lib/hadoop-aws-2.7.3.jar
are correct and these files exist at these locations.
Add JARs to Hive classpath: You need to add these JAR files to the Hive classpath. You can do this by setting HIVE_AUX_JARS_PATH
environment variable. You can set this variable in hive-env.sh
file which is located in the conf
directory of your Hive installation.
Restart Hive service: After updating hive-env.sh
, you need to restart your Hive service so that the changes can take effect.
Re-run Spark job: Now, you can re-run your Spark job and it should be able to find the required classes.
Please note that the exact steps might vary depending on your environment and Hive installation.
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced Hi, I am trying to use S3 bucket(minio) for storing incremental updates and when i run spark job at the stage where hive synch is done class not found exception is occuring. ' java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found'
Query is where to define jars relevant to aws for hive to resolve the missing jars ?
spark-submit \ --jars /opt/hive/lib/aws-java-sdk-1.11.30.jar,/opt/hive/lib/hadoop-aws-2.7.3.jar \ --packages org.apache.spark:spark-avro_2.12:3.0.1,org.apache.hudi:hudi-spark3-bundle_2.12:0.9.0,org.apache.hadoop:hadoop-aws:2.7.3 \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \ --table-type MERGE_ON_READ \ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ --source-ordering-field END_STR \ --target-base-path s3a://table-format-hudi/pgw_sessions_main_mor \ --target-table pgw_sessions_main_mor --props /var/demo/config/kafka-source.properties_vaz \ --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \ --hoodie-conf hoodie.datasource.write.partitionpath.field=END_STR:timestamp \ --hoodie-conf hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING \ --hoodie-conf hoodie.deltastreamer.keygen.timebased.input.dateformat="yyyy-MM-dd HH:mm:ss.sss" \ --hoodie-conf hoodie.deltastreamer.keygen.timebased.output.dateformat="yyyy/MM/dd" \ --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator \ --hoodie-conf hoodie.datasource.write.operation=upsert \ --enable-sync \ --hoodie-conf hoodie.compact.inline=true \ --hoodie-conf hoodie.compact.schedule.inline=false \ --hoodie-conf hoodie.compact.inline.max.delta.commits=4 \ --hoodie-conf hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver:10000 \ --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor \ --hoodie-conf hoodie.datasource.hive_sync.enable=true \ --hoodie-conf hoodie.datasource.hive_sync.auto_create_datab=true \ --hoodie-conf hoodie.datasource.hive_sync.database=hudidatabase \ --hoodie-conf hoodie.datasource.hive_sync.table=hudi_data_test \ --hoodie-conf hoodie.datasource.hive_sync.partition_fields=['DATE_STRING'] \ --source-limit 20000 A clear and concise description of the problem.
To Reproduce
Steps to reproduce the behavior:
1. 2. 3. 4.
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.