Closed sadikovi closed 5 years ago
@sunchao would you like to comment? We can fix it, but the code would return a wrong result different result compared to Spark.
By the way, file is written using WIP of parquet-rs write support!
@sadikovi : can we close this issue? I believe this is largely resolved by #184?
Kind of. Yes, we can close it. On Wed, 7 Nov 2018 at 7:50 PM, Chao Sun notifications@github.com wrote:
@sadikovi https://github.com/sadikovi : can we close this issue? I believe this is largely resolved by #184 https://github.com/sunchao/parquet-rs/pull/184?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sunchao/parquet-rs/issues/148#issuecomment-436735641, or mute the thread https://github.com/notifications/unsubscribe-auth/AHbY3oBAgrUwXXZNHTR3IUi8Vk7QYMh2ks5usyuEgaJpZM4V7LPa .
Following the discussion on the PR, we found that the code fails to read Int96[0, 0, 0]. It is quite an edge case, because 1 January 1970 would correspond to something like Int96[0, 0, 2440588].
This is result from Spark, when reading Int96[0, 0, 0], Int96[0, 0, 1], and Int96[1, 0, 0]:
Milliseconds:
I tried patching the code, and this works and returns the exact milliseconds like from Spark:
But when converting to a human-readable date, I get the following:
It looks like chrono library only supports dates after 1 January 1970. I attached the sample file (in archive) with a single column of Int96.
sample.parquet.zip