sparkutils / quality

A Quality Spark DQ Library
https://sparkutils.github.io/quality/
Apache License 2.0
4 stars 2 forks source link

DBR 14.3 support #57

Closed chris-twiner closed 2 days ago

chris-twiner commented 8 months ago

frameless and Analyzer changes (nested object ResolveReferences uses new) - requires 4..0 rc's

chris-twiner commented 6 months ago

https://github.com/typelevel/frameless/issues/787

chris-twiner commented 6 months ago
18) testProbabilityRulePass(com.sparkutils.qualityTests.RuleEngineTest)
java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.analysis.Analyzer.ResolveReferences()Lorg/apache/spark/sql/catalyst/analysis/Analyzer$ResolveReferences$;
    at org.apache.spark.sql.QualitySparkUtils$.resolution(QualitySparkUtils.scala:125)
19) testSimpleProductionRules(com.sparkutils.qualityTests.RuleEngineTest)
java.lang.IllegalStateException: StructFieldsOperation.dataType should not be called.
    at com.sparkutils.quality.impl.util.StructFieldsOperation.dataType(StructFunctions.scala:66)
    at com.sparkutils.quality.impl.util.StructFieldsOperation.dataType$(StructFunctions.scala:65)
    at com.sparkutils.quality.impl.util.WithField.dataType(StructFunctions.scala:85)
    at com.databricks.sql.execution.VariantExpressionsCheck$.$anonfun$apply$4(VariantExpressionCheck.scala:58)
    at com.databricks.sql.execution.VariantExpressionsCheck$.$anonfun$apply$4$adapted(VariantExpressionCheck.scala:53)
    at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:249)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:250)
chris-twiner commented 6 months ago
1) mapAggrDecimalDSLTest(com.sparkutils.qualityTests.AggregatesTest)
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1511.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1511.0 (TID 19528) (ip-10-172-240-113.us-west-2.compute.internal executor driver): java.lang.ClassCastException: java.lang.Double cannot be cast to org.apache.spark.sql.types.Decimal

only one failure left.

chris-twiner commented 4 months ago

below is two new issues for 14.3...

https://issues.apache.org/jira/browse/SPARK-47509 has been introduced which stops the pattern used in Quality of nested subqueries with lambdas. For the intended use cases in Quality (now fails 3 tests) they should be sound:

scalarSubqueryAsOutputExpressionViaLambdaNonAttributeParam
scalarSubqueryAsOutputExpressionViaLambdaNoParam
scalarSubqueryAsOutputExpressionViaLambdaParam

these cases generate a correct join, this isn't the case for the transform example.

Additionally Databricks has further changed with another optimisation that occurs before extraOptimizations which is not yet in the oss, despite it's package name. This doesn't allow for StructFunctions to work, Quality should instead use the inbuilt ones for version > 3.1.0, which should be properly handled by Databricks and roll back the change for the previous custom version.

(these two failures do not occur on 14.3 LTS on CE - the two are not in sync).

chris-twiner commented 4 months ago

tests of RC3 and RC4 snapshot show full test completion on dbr 14.3 as of 19.04.24 - leaving open for RC testers issues

chris-twiner commented 2 days ago

closing - no issues raised on RC5, 14.3 also tests against 15.4 lts' (aside from error message differences).