Closed jatin-sandhuria closed 2 years ago
Thanks Jatin, what version of Glow are you using and what version of Spark (/ the Databricks Runtime)?
This function is tested nightly pulling from this Docker container on Databricks Runtime 9.1 (Spark 3.1.2):
projectglow/databricks-glow:9.1
Hi William - I am using 9.0 (includes Apache Spark 3.1.2, Scala 2.12) glow.py==1.1.0
Thanks, I ran this notebook: variant-qc-demo.html
Using the Docker container on Databricks Runtime 9.1 And manually installing Glow v1.1.0 on Databricks Runtime 9.0
And both worked
do you have other libraries installed on the cluster such as Hail?
oh also did you drop the genotypes column? Can't see it in the samples_filtered_genotype_df
schema
apparently describe doesnt show genotype. not sure why
genotype_df = spark.read.format('delta').load(genotype_delta_path)
contigName:string start:long end:long names:array element:string referenceAllele:string alternateAlleles:array element:string INFO_variant_id:string INFO_rsq:double INFO_chromosome:string INFO_new_variant_id:string INFO_AC:array INFO_AF:array element:double INFO_AN:integer INFO_homozygote_count:array INFO_call_rate:double genotypes:array element:struct sampleId:string calls:array element:integer phased:boolean posteriorProbabilities:array element:double dosage:double old_GT:string old_dosage:double
genotype_df.select("*", glow.expand_struct(glow.call_summary_stats("genotypes"))) java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.Alias.<init>(Lorg/apache/spark/sql/catalyst/expressions/Expression;Ljava/lang/String;Lorg/apache/spark/sql/catalyst/expressions/ExprId;Lscala/collection/Seq;Lscala/Option;)V
ok, and do you have any other configurations different from just Glow v1.1.0 and DBR 9.0?
I ran this notebook: variant-qc-demo.html
Using the Docker container on Databricks Runtime 9.1 And manually installing Glow v1.1.0 on Databricks Runtime 9.0
And both worked
Hi William - We removed hail from our cluster but still the error persists.
java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.Alias.<init>(Lorg/apache/spark/sql/catalyst/expressions/Expression;Ljava/lang/String;Lorg/apache/spark/sql/catalyst/expressions/ExprId;Lscala/collection/Seq;Lscala/Option;)V
These are the libraries we have installed.
Databricks runtime version:
Spark Config:
ah, there is a mismatch between the maven and pypi versions of glow
please bump the maven version from 1.0.1
to 1.1.0
closing for now, please reopen if this does not solve the problem
Thanks William. Changing the version resolved it.
samples_filtered_genotype_df.describe() Out[81]: DataFrame[summary: string, contigName: string, start: string, end: string, referenceAllele: string, INFO_variant_id: string, INFO_rsq: string, INFO_chromosome: string, INFO_new_variant_id: string, INFO_AN: string, INFO_call_rate: string, INFO_min_af: string, INFO_min_ac: string]
genotype_df_call_stats = samples_filtered_genotype_df.select("*", glow.expand_struct(glow.call_summary_stats("genotypes"))) java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.Alias.<init>(Lorg/apache/spark/sql/catalyst/expressions/Expression;Ljava/lang/String;Lorg/apache/spark/sql/catalyst/expressions/ExprId;Lscala/collection/Seq;Lscala/Option;)V
/databricks/spark/python/pyspark/sql/dataframe.py in select(self, cols) 1690 [Row(name='Alice', age=12), Row(name='Bob', age=15)] 1691 """ -> 1692 jdf = self._jdf.select(self._jcols(cols)) 1693 return DataFrame(jdf, self.sql_ctx) 1694
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args) 1302 1303 answer = self.gateway_client.send_command(command) -> 1304 return_value = get_return_value( 1305 answer, self.gateway_client, self.target_id, self.name) 1306
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, kw) 115 def deco(*a, *kw): 116 try: --> 117 return f(a, kw) 118 except py4j.protocol.Py4JJavaError as e: 119 converted = convert_exception(e.java_exception)
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o9520.select. : java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.Alias.(Lorg/apache/spark/sql/catalyst/expressions/Expression;Ljava/lang/String;Lorg/apache/spark/sql/catalyst/expressions/ExprId;Lscala/collection/Seq;Lscala/Option;)V
at io.projectglow.sql.expressions.ExpandStruct.$anonfun$expand$1(glueExpressions.scala:45)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at io.projectglow.sql.expressions.ExpandStruct.expand(glueExpressions.scala:43)
at io.projectglow.sql.optimizer.ResolveExpandStructRule$.$anonfun$expandExprs$1(hlsOptimizerRules.scala:83)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at io.projectglow.sql.optimizer.ResolveExpandStructRule$.io$projectglow$sql$optimizer$ResolveExpandStructRule$$expandExprs(hlsOptimizerRules.scala:81)
at io.projectglow.sql.optimizer.ResolveExpandStructRule$$anonfun$apply$3.applyOrElse(hlsOptimizerRules.scala:67)
at io.projectglow.sql.optimizer.ResolveExpandStructRule$$anonfun$apply$3.applyOrElse(hlsOptimizerRules.scala:65)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:137)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:137)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:340)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:133)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:129)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:110)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:109)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:30)
at io.projectglow.sql.optimizer.ResolveExpandStructRule$.apply(hlsOptimizerRules.scala:65)
at io.projectglow.sql.optimizer.ResolveExpandStructRule$.apply(hlsOptimizerRules.scala:63)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$3(RuleExecutor.scala:221)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:221)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:218)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:210)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:210)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:285)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:278)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:224)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:188)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:109)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:188)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:260)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:347)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:259)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:96)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:134)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:180)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:852)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:180)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:97)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:86)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:94)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:852)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:92)
at org.apache.spark.sql.Dataset.withPlan(Dataset.scala:3849)
at org.apache.spark.sql.Dataset.select(Dataset.scala:1489)
at sun.reflect.GeneratedMethodAccessor626.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)