linkedin / transport

A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive, and Presto.
BSD 2-Clause "Simplified" License
297 stars 73 forks source link

Transport plugin shade all by default which cause type not found issue when refer to other libs #112

Open wangtao724 opened 2 years ago

wangtao724 commented 2 years ago

When I create a new UDF function which calls the API exposed by another lib, it errored when I tested in HIVE and other platforms.

Caused by: com_linkedin_jobs_udf_jobs_udfs_2_1_1.org.apache.avro.AvroTypeException: Found com.linkedin.standardization.taxonomy.industries.IndustryStatus, expecting com_linkedin_jobs_udf_jobs_udfs_2_1_1.com.linkedin.standardization.taxonomy.industries.IndustryStatus
    at com_linkedin_jobs_udf_jobs_udfs_2_1_1.org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:309)
    at com_linkedin_jobs_udf_jobs_udfs_2_1_1.org.apache.avro.io.parsing.Parser.advance(Parser.java:86)
    at com_linkedin_jobs_udf_jobs_udfs_2_1_1.org.apache.avro.io.ResolvingDecoder.readEnum(ResolvingDecoder.java:260)

The reason is that it expects the shaded type prefixed with com_linkedin_jobs_udf_jobs_udfs_2_1_1 but found the original type.

To workaround it, we have to explicitly exclude those namespaces by adding

shadeHiveJar.setDoNotShade(["com.linkedin.standardization.taxonomy.industries.*"])
shadeSpark_211Jar.setDoNotShade(["com.linkedin.standardization.taxonomy.industries.*"])
shadeSpark_212Jar.setDoNotShade(["com.linkedin.standardization.taxonomy.industries.*"])

to build.gradle.

Is this by design or a bug?

For more information, please refer to the internal discussion: https://linkedin-randd.slack.com/archives/C02D9EYGPGA/p1641401436435300