Closed dberma15 closed 2 years ago
hey @dberma15 what version of glow are you using?
Please try Glow v1.1.0 (not 1.0.1 or 1.0.0) on Databricks Runtime 9.1
You can use either the prepackaged docker container, projectglow/databricks-glow:9.1
Or attach the Pypi package and maven coordinates to the cluster, see screenshots below
I was using 1.0.0, I think. I created a new cluster running 9.1 and using 1.1.0 but now I get an error on the second to last cell:
stats_df = df.groupBy("INFO_SVTYPE")\
.agg(expr("""aggregate_by_index(
genotypes,
0,
(nonref, g) -> if(exists(g.calls, call -> call != -1 and call != 0), nonref + 1, nonref),
(nonref1, nonref2) -> nonref1 + nonref2) as count_non_ref"""))
display(stats_df)
AnalysisException: Invalid call to dataType on unresolved object, tree: 'if('exists(lambda 'g.calls, lambdafunction((NOT (lambda 'call = -1) AND NOT (lambda 'call = 0)), lambda 'call, false)), (lambda 'nonref + 1), lambda 'nonref)
---------------------------------------------------------------------------
AnalysisException Traceback (most recent call last)
<command-3319344438369073> in <module>
----> 1 stats_df = df.groupBy("INFO_SVTYPE")\
2 .agg(expr("""aggregate_by_index(
3 genotypes,
4 0,
5 (nonref, g) -> if(exists(g.calls, call -> call != -1 and call != 0), nonref + 1, nonref),
/databricks/spark/python/pyspark/sql/group.py in agg(self, *exprs)
116 # Columns
117 assert all(isinstance(c, Column) for c in exprs), "all exprs should be Column"
--> 118 jdf = self._jgd.agg(exprs[0]._jc,
119 _to_seq(self.sql_ctx._sc, [c._jc for c in exprs[1:]]))
120 return DataFrame(jdf, self.sql_ctx)
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
121 # Hide where the exception came from that shows a non-Pythonic
122 # JVM exception message.
--> 123 raise converted from None
124 else:
125 raise
AnalysisException: Invalid call to dataType on unresolved object, tree: 'if('exists(lambda 'g.calls, lambdafunction((NOT (lambda 'call = -1) AND NOT (lambda 'call = 0)), lambda 'call, false)), (lambda 'nonref + 1), lambda 'nonref)
hey @dberma15 this is a private function in spark that has been deleted from databricks runtime with Spark 3.1. I have a ticket open with engineering tracking this
So for now you cannot use this cell, but everything else in the notebook examples do work (as per nightly testing of glow)
I removed the offending cell and the notebooks should work now with the latest version of Glow,
Please download it from here,
The Sample Quality Control notebook breaks when trying to run it on databricks with runtime version 9.0, spark 3.1.2.
Here's the notebook in question: https://glow.readthedocs.io/en/latest/_static/notebooks/etl/sample-qc-demo.html
The error is on the fourth command:
display(qc.selectExpr("explode(qc) as per_sample_qc").selectExpr("expand_struct(per_sample_qc)"))
The following is the error message: