Open engechas opened 3 months ago
Just trying to understand: is this a bug in Spark Iceberg reader itself?
Yes it looks like a bug in the Spark Iceberg reader
Yes it looks like a bug in the Spark Iceberg reader
Thanks for confirming! If possible, could you test it with Spark 3.5 because we've bumped the version and planning to release 0.5 soon.
Peng encountered this in some of his testing with EMRs 7.2/Spark 3.5 so doesn't look like the version bump will fix it unfortunately
whats the path ahead here?
What is the bug? When running certain queries that involve timestamp fields against Iceberg tables an exception is thrown during query execution:
More info:
time_dt
field that causes the exception istimestamp
.time_dt
field is a timestamp in millisecond granularityThe exception comes from here: https://github.com/apache/iceberg/blob/1.2.x/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L273
It looks like the
TimeStampMicroVector
is coming from here: https://github.com/apache/iceberg/blob/main/arrow/src/main/java/org/apache/iceberg/arrow/ArrowSchemaUtil.java#L103-L107How can one reproduce the bug? Steps to reproduce the behavior: The exact mechanism to reproduce this is unknown. The below query causes the exception:
What is the expected behavior? The query should execute successfully instead of throwing a ClassCastException
What is your host/environment?
Do you have any screenshots? If applicable, add screenshots to help explain your problem.
Do you have any additional context? Add any other context about the problem.