Closed williambrandler closed 2 years ago
when using aggregate_by_index to compute aggregation array in Spark 3.1, Glow 1.1.0, the code fails with
stats_df = df.groupBy("INFO_SVTYPE")\ .agg(expr("""aggregate_by_index( genotypes, 0, (nonref, g) -> if(exists(g.calls, call -> call != -1 and call != 0), nonref + 1, nonref), (nonref1, nonref2) -> nonref1 + nonref2) as count_non_ref""")) display(stats_df)
AnalysisException: Invalid call to dataType on unresolved object, tree: 'if('exists(lambda 'g.calls, lambdafunction((NOT (lambda 'call = -1) AND NOT (lambda 'call = 0)), lambda 'call, false)), (lambda 'nonref + 1), lambda 'nonref)
Any ideas on how to resolve @henrydavidge , should we delete this example or rewrite it with higher order functions?
seems like this is related to Databricks runtime and not to open source spark, so closing
when using aggregate_by_index to compute aggregation array in Spark 3.1, Glow 1.1.0, the code fails with
stats_df = df.groupBy("INFO_SVTYPE")\ .agg(expr("""aggregate_by_index( genotypes, 0, (nonref, g) -> if(exists(g.calls, call -> call != -1 and call != 0), nonref + 1, nonref), (nonref1, nonref2) -> nonref1 + nonref2) as count_non_ref""")) display(stats_df)
AnalysisException: Invalid call to dataType on unresolved object, tree: 'if('exists(lambda 'g.calls, lambdafunction((NOT (lambda 'call = -1) AND NOT (lambda 'call = 0)), lambda 'call, false)), (lambda 'nonref + 1), lambda 'nonref)
Any ideas on how to resolve @henrydavidge , should we delete this example or rewrite it with higher order functions?