Open numberlabs-developers opened 9 months ago
torvalds.dev is analyzing the ticket
The user is experiencing a java.lang.NoSuchMethodError
when trying to do Merge-on-Read (MoR) writes with Hudi on AWS EMR version 6.7.0. This issue is present when reading MoW files from S3, but not when reading Copy-on-Write (CoW) files. The user is trying to avoid upgrading to EMR 6.9.0, which has been suggested as a solution in a GitHub thread.
Their environment includes:
Relevant files provided by the user:
packaging/hudi-utilities-slim-bundle/README.md
: This readme file contains instructions on using the hudi-utilities-slim-bundle, which is meant to be used together with Hudi Spark bundle if using hudi-utilities-bundle solely introduces problems for a specific Spark version. It includes examples of running Hudi Streamer for Spark versions 2.4.7, 3.1.2, and 3.2.0.
README.md
: This is the main readme file for Apache Hudi, providing a general overview of Hudi and its capabilities, along with instructions for building from source, running tests, and a quick start guide.
Possible actions to take:
org.apache.spark.sql.execution.datasources.PartitionedFile.<init>
and org.apache.hudi.MergeOnReadSnapshotRelation.$anonfun$buildSplits
as they are indicated in the error message.The user might also want to follow the Apache Hudi project on GitHub, where they can get updates on issues and pull requests that could be relevant to their problem.
I'm getting
3/08/29 08:32:51 ERROR Client: Application diagnostics message: User class threw exception: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.PartitionedFile.(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V
at org.apache.hudi.MergeOnReadSnapshotRelation.$anonfun$buildSplits$2(MergeOnReadSnapshotRelation.scala:127)
at scala.Option.map(Option.scala:230)
at org.apache.hudi.MergeOnReadSnapshotRelation.$anonfun$buildSplits$1(MergeOnReadSnapshotRelation.scala:125)
I'm using EMR 6.7.0 and these libraries on my .jar (program)
It is interesting because I've just changed the input which the program reads (and the code of the program for reading MoR of course ;) ) and the program writes on MoR.
If it reads CoW files from S3, it works. If it reads MoW files from S3, it throws the exception from above Any clue? I've seen that people are suggesting to use EMR 6.9.0... https://github.com/apache/hudi/issues/8903#issuecomment-1624977292 but I would like to see if the issue could be resolved on EMR 6.7.0 so that I don't have to upgrade the whole libraries from my project :/
Environment Description
Hudi version : 0.11.1
Spark version : 3.2.1
Hadoop version : 3.2.1
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no