MertHoc commented 5 years ago

Adding Nanosecond support to PrestoDB

Introduction

We wish to support Nanosecond timestamps within Presto to support companies that retrieve data at that granularity. One industry that deals with nanosecond granularity is the finance industry.

Within this project, we will introduce a Fractional second support to TIMESTAMP, and TIMESTAMP WITH TIME ZONE with precision greater than 3 (ms). For example:

CREATE EXTERNAL TABLE test (
    timestamp_microseconds TIMESTAMP(6),
    timestamp_nano_with_tz TIMESTAMP(9) WITH TIME ZONE
) STORED AS TEXT LOCATION 'XYZ';

Design Decisions:

Encoding

The current timestamp data types are being encoded as long at the millisecond resolution[1][2][3] when packing into blocks during shuffling and movement of data. The original thought when looking at this project was to always encode the timestamp at the nanosecond resolution within an existing long. With this method, we could store timestamps between the years 1678 to 2262 [4]. If we needed to In the future, if we needed a wider range, we would add a new int that would store the nanoseconds from midnight, similar to how other implementations store timestamps. This approach allowed us to minimize the number of code changes while keeping the ability to enhance the time range in the future if needed.

However, after some research, this approach may not work. For timestamps that contain time zone information, the timezone is packed into the long using the last 3 bytes of the long, and the milliseconds is shifted by the 3 bytes to the left and stored in the remaining bytes[6]. This reduces the available range of possible dates to only 20 days from Jan 1, 1970 [5] which is not sufficient. Thus, we will be forced to information needed into a buffer larger than 8 bytes. The components that we would need to store are:

the timestamp in milliseconds (minimum 50 bits to remain compatible with todays range [0-1,125,899,906,842,624])
nanoseconds portion (minimum of 20 bits [0-1,000,000])
timezone (12 bits if implemented in the same fashion as today)

I believe that precision is not needed to be stored with the other information as we will treat everything at nanosecond resolution.

Thus, I am proposing the following:

For time only data types (TIME, TIME WITH TIME ZONE) we change the resolution to nanosecond, and leave the data type as a Long is sufficient to store the data.
For TIMESTAMP and TIMESTAMP WITH TIME ZONE (data types that contain both date and time), we would need to add an extra int (4 bytes) to store the nanoseconds portion.

Impact: The impact of adding the extra 4 bytes (int) will be the following:

When users upgrade, they may start to see certain queries start to fail due to OOM since we are making the timestamp bigger.
Functions that originally returned long will need to be changed to something else. This is something that we still need to figure out. Example functions are currentTimestamp (https://github.com/prestodb/presto/blob/1c1108fb460ae85a1c993afbac389d8487336052/presto-main/src/main/java/com/facebook/presto/operator/scalar/DateTimeFunctions.java#L143). However, these functions do not seem to be called from anywhere.

Mitigation: There are two mitigation strategies we can employ:

We can employ a config parameter to determine if we pack and unpack the 4 bytes for nanoseconds. If we do not pack the extra 4 bytes, then the behavior should be exactly the same.
We can pack a single bit that determines if a timestamp is being packed using the 4 bytes for nanoseconds. If so, then we unpack an int from the block.

Effects on Precision when comparing two timestamps with different precisions:

The result of any operation on two timestamps will result with a timestamp that is of higher precision. The precision decimals of the lower precision timestamp will be assumed to be 0 if the digits do not exist. This is the behavior of DB2, and seems to be specified in the SQL Spec. (See below for details).

Justification: As per SQL Spec (https://standards.iso.org/ittf/PubliclyAvailableStandards/c060394_ISO_IEC_TR_19075-2_2015.zip) "Year-month intervals are comparable only with other year-month intervals. If two year-month intervals have different interval precision, they are, for the purpose of any operations between them, converted to the same precision by appending new datetime fields to either one of the ends of one interval, or to both ends. New datetime fields are assigned a value of 0 (zero)."

Similarly with "Day-time intervals are comparable only with other day-time intervals. If two day-time intervals have different interval precision, they are, for the purpose of any operations between them, converted to the same precision by appending new datetime field to either one of the ends of one interval, or to both ends. New datetime fields are assigned a value of 0 (zero)."

From DB2’s documentation, ( https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/sqlref/src/tpc/db2z_datetimecomparisions.html) "When comparing timestamp values with different precision, the higher precision is used for the comparison and any missing digits for fractional seconds are assumed to be zero." Displaying Timestamps with Nanosecond granularity:

Today, I believe we are always displaying the timestamp in "uuuu-MM-dd HH:mm:ss.SSS" format. I believe that this should continue and provide functions that can output different formats (date_format()).

What changes are being made?

(THIS IS NOT EXHAUSTIVE AS OF YET)

Grammar Changes: SqlBase.g4 -> Add specification for precision in grammar (https://github.com/prestodb/presto/blob/7545741123949ee5b168358acbd7b404112b91f5/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4#L761-L767)

Change SPI to change Long’s to int128 for time/timestamps.

Functions:

JDBC:

https://github.com/prestodb/presto/blob/master/presto-jdbc/src/main/java/com/facebook/presto/jdbc/ColumnInfo.java

Parquet Changes:

ORC Changes:

https://github.com/prestodb/presto/blob/master/presto-orc/src/main/java/com/facebook/presto/orc/writer/TimestampColumnWriter.java

RCFile Changes:

https://github.com/prestodb/presto/blob/7545741123949ee5b168358acbd7b404112b91f5/presto-rcfile/src/main/java/com/facebook/presto/rcfile/text/TimestampEncoding.java#L106

Further changes depending on acceptance on Design.

Endnotes [1] SqlTime - https://github.com/prestodb/presto/blob/7545741123949ee5b168358acbd7b404112b91f5/presto-spi/src/main/java/com/facebook/presto/spi/type/SqlTime.java#L29-L30

[2] SqlTimestamp - https://github.com/prestodb/presto/blob/7545741123949ee5b168358acbd7b404112b91f5/presto-spi/src/main/java/com/facebook/presto/spi/type/SqlTimestamp.java#L32-L33

[3] SqlTimeWithTimeZone - https://github.com/prestodb/presto/blob/7545741123949ee5b168358acbd7b404112b91f5/presto-spi/src/main/java/com/facebook/presto/spi/type/SqlTimeWithTimeZone.java#L33-L34

[4] 9223372036854775807 (size of long) / 1000,000,000 (ns => s ) / 60 (sec/min) / 60 (min/hr) / 24 (hr/day) / 365 (days/year) = 292 years. 1970 + 292 = 2262, 1970 - 292 = 1678

[5] 2^(64-12) (size of long) / 1000,000,000 (ns => s ) / 60 (sec/min) / 60 (min/hr) / 24 (hr/day) / 365 (day/year) ~ 3 years.

[6] DateTimeEncoding.java - https://github.com/prestodb/presto/blob/af210059702f36df76bebc3f4b32bc106f523771/presto-spi/src/main/java/com/facebook/presto/spi/type/DateTimeEncoding.java#L26

hocanint-amzn commented 5 years ago

Sorry all, I created this SIM with a very old account. I will be interacting with this SIM through this account including any PR's, if any.

aweisberg commented 5 years ago

I talked about this a bit with @oerling. It looks like the syntax for this is to have "TYPE ( p )" where p is the fractional seconds that is supported. This is how it's done in the ANSI SQL standard.

There are two parts of this. One is reusable no matter what we go with internally in terms fo 64-bits or 96-bits. The work is to add the syntax and storing the additional type information in the metadata store. Once you can specify precision all the things that rely on precision like formatting and parsing or that supply metadata for columns need to be updated to pass that information on.

The second part IMO which is maybe optional is adding support for more than 64-bits of precision. And my question is do we have to do that now? Is it worth just changing how we interpret the existing 64-bits since we need to do all that work anyways? What if we never come across a use case that needs that much precision outside that time range?

wenleix commented 5 years ago

@MertHoc : Thanks for the contribution. One unrelated trick: when you want to refer to a specific line in a file, you might want to get the permanent link to files: https://help.github.com/en/articles/getting-permanent-links-to-files . Otherwise, the link will refer to different lines once file changed :)

MertHoc commented 5 years ago

@wenleix Thanks for the tip. I have updated my links to permalinks @aweisberg I have updated the summary with the doc that I shared with you. If we can please review, we would love to get started on implementation. Thanks!

rschlussel commented 5 years ago

@MertHoc thanks for the great write up!

Functions that originally returned long will need to be changed to something else. This is something that we still need to figure out. Example functions are currentTimestamp. However, these functions do not seem to be called from anywhere.

Those are documented user-facing functions. They return a long because that's the current representation for the TIMESTAMP_WITH_TIMEZONE type (they are annotated as returning that type). If the representation of timestamp_with_timezone changes, then the return type of those functions should be changed so that a user still gets back a timestamp with time zone ( TimestampWithTimeZoneType should still parse the result correctly).

hocanint-amzn commented 5 years ago

@rschlussel Thanks for the information. That makes a lot of sense. Everyone else, there were concerns about the potential performance impact of this change, as we are increasing the number of bytes needed to store timestamps in a block and the potential of breaking existing users as timestamps will higher memory requirements. How can we move forward ?

aweisberg commented 5 years ago

You have made the case and given interested parties a chance to comment. I know @arhimondr @rschlussel, @mbasmanova and @oerling have considered this and think it's a reasonable thing to do. We'll have to measure the result, but I think it's going to be good enough.

oerling commented 5 years ago

I think that performance or memory consumption differences arising from this will fall below measurement threshold: If 2% of columns are timestamps and a few of these take double space in query intermediate results, nobody will notice even if looking.

I further think that getting this matter successfully integrated into PrestoDB is a measure of our stated willingness to engage with the community.

From: Mert Hocanin notifications@github.com Sent: Tuesday, August 13, 2019 6:35 AM To: prestodb/presto presto@noreply.github.com Cc: oerling erling@xs4all.nl; Mention mention@noreply.github.com Subject: Re: [prestodb/presto] Support for nano-second/microsecond timestamps (#13063)

@rschlussel https://github.com/rschlussel Thanks for the information. That makes a lot of sense. Everyone else, there were concerns about the potential performance impact of this change, as we are increasing the number of bytes needed to store timestamps in a block and the potential of breaking existing users as timestamps will higher memory requirements. How can we move forward ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prestodb/presto/issues/13063?email_source=notifications&email_token=AKPPPT4NL5JKAX4G6OM2W33QEKZ7LA5CNFSM4H6NUFM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4FVKWQ#issuecomment-520836442 , or mute the thread https://github.com/notifications/unsubscribe-auth/AKPPPT4MPKSFJ5J63I274ULQEKZ7LANCNFSM4H6NUFMQ . https://github.com/notifications/beacon/AKPPPTZTDNLQVJHJKMC6NTDQEKZ7LA5CNFSM4H6NUFM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4FVKWQ.gif

mbasmanova commented 5 years ago

@MertHoc Thanks for detailed design proposal. I have some questions.

Looks like you are suggesting to parameterize timestamp type with precision (similar to decimal). What would be the implications, if any, on Hive Metastore? How would you store this new time there?
You may want to take into account on-going changes to ORC readers and repartitioning - #13213 and #13183
Currently a block for timestamp is just a block of longs. Are you envisioning a new block that stores data in two arrays: long[] seconds and int[] nanoSeconds?

mbasmanova commented 5 years ago

CC: @yingsu00 @tdcmeehan @bhhari @sayhar

sdruzkin commented 4 years ago

We are also interested in the nanosecond precision for timestamps to fully support timestamps in the ORC file format.

prestodb / presto

Support for nano-second/microsecond timestamps #13063