sparkutils / quality

A Quality Spark DQ Library
https://sparkutils.github.io/quality/
Apache License 2.0
4 stars 2 forks source link

Subquery usage in Folder, RuleEngine and Expression can trigger NotSerializableException #48

Closed chris-twiner closed 1 year ago

chris-twiner commented 1 year ago

The expressions used to create rules such as:

      def sub(comp: String = "> 2", tableSuffix: String = "") = s"struct((select max(i_s$tableSuffix.i) from $tableName i_s$tableSuffix where i_s$tableSuffix.i $comp))"

      val rs = RuleSuite(Id(1, 1), Seq(
        RuleSet(Id(50, 1), Seq(
          Rule(Id(101, 1), ExpressionRule(s"(select max(i) > 1 from $tableName)"), RunOnPassProcessor(1000, Id(3010, 1),
            OutputExpression(sub("> main.i"))))
        ))
      ))

will work when the souce is a LocalRelation but using a file will trigger serialization which fails as reset (via cleanExprs) was not added to the other runners.

chris-twiner commented 1 year ago

tested and proved against original triggering code