Open nickorka opened 9 months ago
That's quite strange. Do you still see this error with the new version of Glow?
I don't know. Maybe the problem is still there. I've found a workaround by dumping whole dataframe to parquet file on HDFS and continue the step with the parquet file instead of dealing with long query.
By the way, the new version 2.0.0 is not even initialize. It fails on import glow with some numpy compatibility error. There is no backward compatibility at all.
Try pip install -U glow.py
I'm trying to implement variant normalization function. I'm calling it within a dataframe like this:
I'm preparing
"contigName", "start", "end", "referenceAllele", "alternateAlleles"
field before the call, and I've checked there is no any NULL values in any of the fields. During Spark action call I'm getting this error:I've tried to run just this part of dataframe from pyspark session manually, there were no any errors. But when I run whole pipeline with all joins it's failing just on this step for multiple containers. Here you can see the executor stats:![glow_error](https://github.com/projectglow/glow/assets/5385562/d22534f8-a347-4b05-9aeb-53a7c7e04b34)
I'm running this on Spark 3.4.1 with 6G executors and 3G driver.
It looks like Glow cannot find a listener for a specific task. Can you help me with this, please?