Open jimmymaise opened 5 years ago
Please check #40, this is a duplicate
I want to circle back to give some feedback to this issue.
One of our teams in Yelp were affected by this problem (empty result set) as well. After upgrading their Redshift cluster to version 1.0.10936, the problem has gone away. Redshift has various version based on region, I think you need at least version 1.0.10880 or above.
After querying redshift, i have a dataframe with only one record
+------------------+--------------------+------- | accountid| accountname|rangeid +------------------+--------------------+------- |00139|Arizona Public Se...| null +------------------+--------------------+-------
If i do filiter to count non empty/non null of accountid. It's OK. I got expected result 1 If i do filiter to count non empty/non null of rangeid. It's NOK. I got unexpected result. Exception
It happens the same for the zero result when filtering. Code:
df.filter(df[column].isNotNull() & (df[column] != "")).count()
Even i don't use count() but show(), i still get same error.
5 days ago, my code run OK. But now it throws exception.
If i save the data frame to file in local and read from it, It works well. But because i need to count this metric of the dataframe getting from redshift and compare with the metric of dataframe getting from a saved file, i still need to count directly from the data frame getting from redshift.
Now, i'm trying to try/catch to assign value 0 when getting exception, but actually, it's not a good way.