projectglow / glow

An open-source toolkit for large-scale genomic analysis
https://projectglow.io
Apache License 2.0
271 stars 111 forks source link

Error when using expand_struct #384

Closed mirhendi closed 3 years ago

mirhendi commented 3 years ago

I'm trying to run this in Databricks Runtime ​8.2 ML:

df = spark.createDataFrame([Row(struct=Row(a=1, b=2))])
df.select(glow.expand_struct(col('struct'))).collect()

And here is the error:

AnalysisException                         Traceback (most recent call last)
<command-3109698327909231> in <module>
      1 from pyspark.sql import Row
      2 df = spark.createDataFrame([Row(struct=Row(a=1, b=2))])
----> 3 df.select(glow.expand_struct(col('struct'))).collect()

/databricks/spark/python/pyspark/sql/dataframe.py in select(self, *cols)
   1690         [Row(name='Alice', age=12), Row(name='Bob', age=15)]
   1691         """
-> 1692         jdf = self._jdf.select(self._jcols(*cols))
   1693         return DataFrame(jdf, self.sql_ctx)
   1694 

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1302 
   1303         answer = self.gateway_client.send_command(command)
-> 1304         return_value = get_return_value(
   1305             answer, self.gateway_client, self.target_id, self.name)
   1306 

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    114                 # Hide where the exception came from that shows a non-Pythonic
    115                 # JVM exception message.
--> 116                 raise converted from None
    117             else:
    118                 raise

AnalysisException: unresolved operator 'Project [expandstruct(struct#3832) AS expandstruct(struct)#3834];

Looking for suggestions, Thank you :)

karenfeng commented 3 years ago

Hi @mirhendi, I was able to repro this when Glow was not registered with spark = glow.register(spark) (note that in Glow v1.0.0, glow.register(spark) is no longer sufficient).

On MLR 7.6 (based on Spark 3.0), this was able to run through after registration. However, I encountered a different issue on MLR 8.2 (based on Spark 3.1):

java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.Alias.&lt;init&gt;(Lorg/apache/spark/sql/catalyst/expressions/Expression;Ljava/lang/String;Lorg/apache/spark/sql/catalyst/expressions/ExprId;Lscala/collection/Seq;Lscala/Option;)V

This is due to a binary incompatibility between Spark 3.0 and 3.1; Glow only works on Spark 3.0 for now.

mirhendi commented 3 years ago

Thank you Karen, changing the runtime and registering glow fixed the problem.