scicloj / tablecloth.time

Tools for the processing and manipulation of time-series data in Clojure.
Other
18 stars 1 forks source link

Extending the tech.ml.dataset for time series #40

Open ezmiller opened 3 years ago

ezmiller commented 3 years ago

This issue is going to start out very vague and may eventually give way to some more specific issues. The problem or question here is, described most broadly, do we need to "extend" the tech.ml.dataset in some way that is especially suited to time series processing.

The best way to get into this is to consider the R tsibble library from which we have been taking inspriation. The tsibble library defines a special type of data enttity, the "tsibble`, which is like a "tibble" but with some extra constraints and features (see here). Namely:

For tablecloth.time, we think we would like to avoid defining a new "type" of dataset. It's not even clear that that is possible. It would probably take us well into a complex territory of trying to extend/override tmd's dataset and associated types. Instead, what we have is a dataset that can have an index, and that can be operated upon by a number of index aware functions. These functions try to detect the index, but simply raise an error if they cannot.

To sum up, we do not in tablecloth.time expect to define a new type and then apply constaints at the moment that this type is constructed. Instead, we think we will let the user have just the same dataset they are used to, and then when they try to use it with the tablecloth.time functions, they may be guided by our docs, the syntax of the arguments, and perhaps also by errors.

That said, there is one clear area where we do want a different kind of interaction from the dataset itself. When the user prints the dataset, we think we may need to give the user some addditional feedback about the dataset that are comparable to the tsibble. What column is operating as the time index? What is the time-interval of the time data?

But how do we do this in a library like tablecloth.time where we also do not want to create a new type of datset? What does it mean to "extend" tech.ml.dataset in into contexts where we want a different type of behavior around printing, for example?