We depend on dtype-next datatypes, including datetime representations for the data in datasets. As the datetime datatype uses integer values for datetimes, we have a gap in modeling higher order granularity in time, such as quarter, month, week, etc.
Proposed Solution
The approach considered here is to treat the higher order time objects as dates, by setting it to the last date of that time period.
So, "2020-Q1" is represented as the local date "2020-03-31", as that is the last-date of the quarter. Last date of the period is just a convention that we are adopting.
Similarly, it is easy to see that "2020-Mar" will be "2020-03-31". We leave it to the user to differentiate quarter vs month aspects of a date "2020-03-21".
The key advantage with this approach is that we can leverage all the features of a datetime datatype, including the ability to automatically index on such columns.
Work remaining
We need helper functions to translate easily from higher order time data to local date and vice-versa.
We need to extend year-quarter to year-month and year-week (as those use cases show up)
Open Questions
At some point, we need to look for alternatives for the above approach. One such possibility is to deconstruct year-quarter into a tuple of two integers for year and quarter, with each value as a column in the data set. The indexing will need a multiple column index solution.
Goal / Problem
We depend on dtype-next datatypes, including datetime representations for the data in datasets. As the datetime datatype uses integer values for datetimes, we have a gap in modeling higher order granularity in time, such as quarter, month, week, etc.
Proposed Solution
The approach considered here is to treat the higher order time objects as dates, by setting it to the last date of that time period.
So, "2020-Q1" is represented as the local date "2020-03-31", as that is the last-date of the quarter. Last date of the period is just a convention that we are adopting.
Similarly, it is easy to see that "2020-Mar" will be "2020-03-31". We leave it to the user to differentiate quarter vs month aspects of a date "2020-03-21".
The key advantage with this approach is that we can leverage all the features of a datetime datatype, including the ability to automatically index on such columns.
Work remaining
Open Questions
At some point, we need to look for alternatives for the above approach. One such possibility is to deconstruct year-quarter into a tuple of two integers for year and quarter, with each value as a column in the data set. The indexing will need a multiple column index solution.