Turns out the main method we needed to adjust here is slice. Rolling window doesn't use the index directly, but uses slice. So I adjusted slice to use the new index.
The adjustment works as follows:
I removed the old index namespace. We retain the function index-by, but instead of building an index itself and placing it on the metadata for the dataset, this function simply sets a meta key :index to the name of the index column specified by by the user.
I added a new namepace tablecloth.time.utils.indexing that holds some fns that help us interact with the index-structure. For example, we can ask a dataset for its column name with the index-column-name function. This function will check to see if the metadata has the :index property set, and if so return that column name; otherwise, it will try to find out if the dataset has just one time column and in that case it will return that column name. If there is no :index metadata, or if there are zero or more than one time columns, then it will return nil.
The index-column-name function supports a series of other functions that help in other ways, in particular can-idenitfy-index-column? and auto-detect-index-column may help some functions avoid requiring users to specify the index with index-by.
The main change that was made to slice then is that we determine the index column by using the index tools that I just described, and then once we have the index we use the tech.ml.dataset column index structure function select-from-index to perform the slice on the index and retrieve the index position values that we need. There are some other smaller changes here, though. I changed the parsing of string representation of times to use the tech.v3.datatype datype keyword instead of the java time class b/c this is how we should be using the datatypes.
A broader change here that will be something we discuss going forward is that we've dropped support for time classes that tech.v3.datatype.datetime does not acknowledge, i.e. java.time.Year, etc. There are some questions here about whether these classes are valuable. @cnuernber has pointed out that for us jave.time.Year may not offer and concrete value over and above simply representing the year as a number. Supporting a type like this in tech.v3.datatype adds a bunch of work that may not have much value. There are some open question about whether there may be a value ot supporting those classes in this library, but we are hoping to discover the answer through experimentation. For the moment, we are taking the path of least resistance. See this discussion for more detail.
Some other stuff in this PR:
I added a solution to activating [time-literals](), copying what they do in the tick library. Basically we run a macro modify-printing-of-time-literals-if-enabled! that will enable these time literals unless they have been disabled by the setting of a jvm-option (solves #21):
I added a namespace tablecloth.time.utils.datatypes that holds some helper functions related to identifying datatypes. This partly a space we can "wrap" some of tech.v3.datatypes functions where we have specific patterns for using its tools as relate to time.
Work remaining
Open Questions
Is it sufficient to set :index as a single value on the metadata? Is there any reason to define the value of :index as an array, to anticipate for multi-indexes?
Goal / Problem
Adjust the slice method to use the new index-structure in tech.ml.dataset added by https://github.com/techascent/tech.ml.dataset/pull/214.
Proposed Solution
Turns out the main method we needed to adjust here is
slice
. Rolling window doesn't use the index directly, but usesslice
. So I adjustedslice
to use the new index.The adjustment works as follows:
index-by
, but instead of building an index itself and placing it on the metadata for the dataset, this function simply sets a meta key:index
to the name of the index column specified by by the user.tablecloth.time.utils.indexing
that holds some fns that help us interact with the index-structure. For example, we can ask a dataset for its column name with theindex-column-name
function. This function will check to see if the metadata has the:index
property set, and if so return that column name; otherwise, it will try to find out if the dataset has just one time column and in that case it will return that column name. If there is no:index
metadata, or if there are zero or more than one time columns, then it will returnnil
.index-column-name
function supports a series of other functions that help in other ways, in particularcan-idenitfy-index-column?
andauto-detect-index-column
may help some functions avoid requiring users to specify the index withindex-by
.slice
then is that we determine the index column by using the index tools that I just described, and then once we have the index we use the tech.ml.dataset column index structure functionselect-from-index
to perform the slice on the index and retrieve the index position values that we need. There are some other smaller changes here, though. I changed the parsing of string representation of times to use the tech.v3.datatype datype keyword instead of the java time class b/c this is how we should be using the datatypes.Some other stuff in this PR:
modify-printing-of-time-literals-if-enabled!
that will enable these time literals unless they have been disabled by the setting of ajvm-option
(solves #21):tablecloth.time.utils.datatypes
that holds some helper functions related to identifying datatypes. This partly a space we can "wrap" some of tech.v3.datatypes functions where we have specific patterns for using its tools as relate to time.Work remaining
Open Questions
:index
as a single value on the metadata? Is there any reason to define the value of :index as an array, to anticipate for multi-indexes?validatable
anymore?