sunchao / parquet-rs

Apache Parquet implementation in Rust
Apache License 2.0
149 stars 20 forks source link

Add Date value to Row api #87

Closed sadikovi closed 6 years ago

sadikovi commented 6 years ago

This PR adds Row::Date so we can process INT32 (DATE) values, see example below:

message spark_schema {
  OPTIONAL INT32 dt (DATE);
  OPTIONAL INT96 ts;
}

{dt: 2010-01-02 +13:00, ts: 2010-01-02 13:12:54 +13:00}
{dt: 2010-01-03 +13:00, ts: 2010-01-03 08:23:01 +13:00}
{dt: 2010-04-05 +12:00, ts: 2010-04-05 11:06:32 +12:00}
{dt: 2010-05-12 +12:00, ts: 2010-05-12 16:38:00 +12:00}
{dt: 2010-11-28 +13:00, ts: 2010-11-28 21:15:12 +13:00}

I also added some doc comments for each value. And changed representation when displayed, so now it will show actual date/time in local timezone instead of raw value (number of days/number of milliseconds).

sadikovi commented 6 years ago

I have a question. Some tools display date and timestamp as actual dates (e.g. YYYY-MM-DD), but we report raw values. Should we be doing the same for Date and Timestamp values? Like Spark, see below:

+----------+-------------------+----+-----+
|      date|                 ts|name|score|
+----------+-------------------+----+-----+
|2010-01-02|2010-01-02 13:12:54| abc| 1.78|
|2010-01-03|2010-01-03 08:23:01| abc| 2.01|
|2010-04-05|2010-04-05 11:06:32| abc| 3.82|
|2010-05-12|2010-05-12 16:38:00| abc| 3.92|
|2010-11-28|2010-11-28 21:15:12| abc| 5.12|
+----------+-------------------+----+-----+

Let me know what you think. If so, I will update both to be rendered in date format rather than raw numeric values.

coveralls commented 6 years ago

Coverage Status

Coverage increased (+0.02%) to 94.798% when pulling 6d617ccb53d39de3f9c9859456ea4914fe2d4e88 on sadikovi:add-row-date into b48b415bf022e72e1131e09c1bf87b87534b2c8c on sunchao:master.

sunchao commented 6 years ago

@sadikovi : yes I think it will be more convenient to display them in a readable format rather than the raw values. This can be addressed in a separate PR though. Will take a look at the PR. Thanks!

sadikovi commented 6 years ago

I updated the code, so now timestamp and date values are formatted as date/time. To make this work, I had to bring another dependency chrono to work with datetime. chrono is used mainly for displaying date in local timezone.

I am not sure if this is acceptable, because we add dependency for date/time display only. Another alternative is showing dates/times in UTC. Then we do not need to import the crate, we can just use this code https://gist.github.com/sadikovi/b708ed51f479d7b9e8b03515756c6d78. There is just one function to get date and time from a timestamp.

@sunchao Let me know what route I should go, I sort of like display in current timezone, but also think that using a crate for this may be an overkill.

sunchao commented 6 years ago

Yes I'm also in favor of displaying the datetime in local timezone. Adding a dependency for this should be OK I think.

sunchao commented 6 years ago

Merged. Thanks @sadikovi !

sadikovi commented 6 years ago

@sunchao thanks!