Users have reported that it's not possible to dynamically provision delta.io packages to use with PySpark.
The erroneous behavior can be reproduced with this commit.
The error is fixed and the delta test (and all others except for logging) is successful with this commit. This fix is only temporary and cannot be merged in it's current form since it breaks the logging tests.
Analysis
The problem is caused by the following two properties that the operator always adds to spark-submit in order to support log aggregation with vector:
Description
Users have reported that it's not possible to dynamically provision delta.io packages to use with PySpark.
The erroneous behavior can be reproduced with this commit.
The error is fixed and the delta test (and all others except for logging) is successful with this commit. This fix is only temporary and cannot be merged in it's current form since it breaks the logging tests.
Analysis
The problem is caused by the following two properties that the operator always adds to
spark-submit
in order to support log aggregation withvector
:In addition, the user classpath is extended like this:
The contents of /stackable/spark/extra-jars/ is:
Acceptance Criteria
Since this is an investigation ticket, the following outcomes are possible:
An update to the Spark images to include Delta dependencies.A new Spark image with with Delta dependencies.Related PRs
Related Issues