stackabletech / spark-k8s-operator

Operator for Apache Spark-on-Kubernetes for Stackable Data Platform
https://stackable.tech
Other
47 stars 2 forks source link

Bugfix: remove usage of "userClassPathFirst" properties [was: Investigate delta.io integration] #354

Closed razvan closed 4 months ago

razvan commented 4 months ago

Description

Users have reported that it's not possible to dynamically provision delta.io packages to use with PySpark.

The erroneous behavior can be reproduced with this commit.

The error is fixed and the delta test (and all others except for logging) is successful with this commit. This fix is only temporary and cannot be merged in it's current form since it breaks the logging tests.

Analysis

The problem is caused by the following two properties that the operator always adds to spark-submit in order to support log aggregation with vector:

--conf spark.driver.userClassPathFirst=true
--conf spark.executor.userClassPathFirst=true

In addition, the user classpath is extended like this:

--conf spark.driver.userClassPath=/stackable/spark/extra-jars/*
--conf spark.executor.userClassPath=/stackable/spark/extra-jars/*

The contents of /stackable/spark/extra-jars/ is:

bash-4.4$ ls -l /stackable/spark/extra-jars/
total 1868
-rw-r--r-- 1 stackable stackable  126137 Feb 12 08:54 jackson-dataformat-xml-2.15.2.jar
-rw-r--r-- 1 stackable stackable  195909 Feb 12 08:54 stax2-api-4.2.1.jar
-rw-r--r-- 1 stackable stackable 1586395 Feb 12 08:54 woodstox-core-6.5.1.jar

Acceptance Criteria

Since this is an investigation ticket, the following outcomes are possible:

Related PRs

Related Issues