Closed ezmiller closed 3 years ago
Nice!!
One comment:
I think there is something misleading about the tests and code examples such as
(-> data-by-year-month
(index-by :year-month)
(ops/slice "1949-01-01" "1949-03-01"))
It would be inefficient to create an index just for one slicing operation. An index is useful under the assumption of multiple slicing operations.
Maybe we should eventually support a version of slicing that does not require creating an index.
One comment:
I think there is something misleading about the tests and code examples such as
(-> data-by-year-month (index-by :year-month) (ops/slice "1949-01-01" "1949-03-01"))
It would be inefficient to create an index just for one slicing operation. An index is useful under the assumption of multiple slicing operations.
Maybe we should eventually support a version of slicing that does not require creating an index.
That makes sense. You mean a version of slicing that is able to achieve the slice without an index at all, even under the hood? Is this purely a performance/optimization concern, or also syntactical in the sense that it is weird to require the user to first call index on a dataset before slicing?
You mean a version of slicing that is able to achieve the slice without an index at all, even under the hood?
Yes.
Is this purely a performance/optimization concern, or also syntactical in the sense that it is weird to require the user to first call index on a dataset before slicing?
The concern is mainly about performance, so I think there is no rush with this option.
@daslu okay understood. I don't want to get too tied up here, and I think in some sense we can afford to merge this and move on. There's nothing stopping us from adding to it/modifying it in the future. Do you agree?
That said, your question inadvertently ended up raising a few more questions for me, but I'll discuss in Zulip (and add link here) since I think there's more visibility there at the moment...
Goal / Purpose
Simplify the slicing API so that a user does not have to call a specific method (e.g.
slice-by-date
) that matches the time unit they are working with on the index. This PR builds on #2.Solution
The approach I took here works with the constraint that we've so far been sticking to, namely: that the user must be responsible for knowing that the unit that they wish to slice by matches the unit of the indexed column(s).
This constraint comes into play in this PR in relation to how we parse the datetime strings that the users provide to the
slice
method. In order to parse these strings we assume that their format should conform to that expected by the parse method of the time unit class of the index's keys. Specifically, we grab the class type of the first key of the index, and then use that type along with Clojure's multimethods to parse the string using the right java.time unit's parse method.So now we can do something like this:
While the following would fail:
The PR also adds a few other things:
#time/year "2015"
) to the slice function;slice
method; andfrom
orto
key does not match the time unit of the index.Work Remaining / Open Questions
Some pieces that may need work in the future: