Open zHaytam opened 3 years ago
(I also saw your stackoverflow, so reading a bit from there that you suspect it is crashing on the write to s3 on some pretty simple data)
The write to s3 code is here, do you know which format you are writing with? That would help make a narrower example.
Hello,
We tried with Avro (default) and CSV, they both throw the same exception. I also read that part of the source code, I suspect that maybe it's because of either convertedRows
or convertedSchema
?
Thanks
Could be, although the converters are just for decimal, date, and timestamp -- which aren't in your example. There is something about making the schema columns be lowercase, which would impact your example -- could see if all lowercase column names helps?
Otherwise, I'd check to see that this isn't a case of spark giving you a misinformative error (e.g., via lazy execution and the issue is actually somewhere else but this was the first spark action). Could try swapping out the redshift write with just an s3 write to the same path.
All the columns in the dataframe we're trying to write are lowercase. Also, we are able to write the dataframe to s3 to the same path (without the conversions).
Do you have an example df? The example you linked in stackoverflow has columns called ["ID", "TYPE", "CODE"]
which are all uppercase. If you have decimal, date, or timestamp types in your df, then a bug in the converters seems more likely.
The dataframe that we tried is this:
name | id | type | count |
---|---|---|---|
x | 0 | cf | 7 |
Nothing advanced.
@88manpreet have any ideas? I don't see anything in the converters that should cause this error
@zHaytam sorry missed getting back and prioritizing this earlier. Is this issue still happening?
I tried to reproduce it in the integration tests for both avro and csv format. Diff: https://gist.github.com/88manpreet/8049611246ee306628dfc3e9df7eb2ad
which I think imitates the above behavior. I could see the temp files created in the scratch path for both avro and csv format.
I also didn't see anything obviously wrong with the converters. I will keep trying to reproduce it in different ways.
I also noticed that the redshift-jdbc42-no-awssdk
you are using is the same one we are using.
@zHaytam in the meantime is it possible to test this case with the latest version v5.0.3?
Would it also be possible for you to share the patch of the relevant code you are using to run into this scenario?
Hello,
We're trying to write a dataframe to redshift, using Spark 3.0.1 (on emr) and your connector, but we receive the following error:
WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, 10.80.139.254, executor 1): java.lang.ArrayStoreException: java.lang.invoke.SerializedLambda
Packages added: