scicloj / tablecloth.time

Tools for the processing and manipulation of time-series data in Clojure.
Other
18 stars 1 forks source link

Interval adjusment: handle case where first element is missing #22

Open ezmiller opened 3 years ago

ezmiller commented 3 years ago

In our current implementation of adjust-interval (See #14), we use the first item in a column to determine the target-datatype of the time unit to which we are converting. More specifically, adjust-interval takes a ->new-time_converter fn that the user supplies, and calls that function on the first item in the targeted column to get the new unit, and then uses tech.v3.datatype/elemwise-datatype to determine the unit's keyword.

This is all fine, but @cnuernber raised a good point that we overlooked:

There are some auto-detection routines for datatype that rely on converting the first element. All I might add to that is you may want to convert the first non-missing element; what if your first element is a missing/null value?

We should figure out a way to handle this case. It might also pay to generalize the process of determining the time datatype from the row if this is going to be a more common practice.

ezmiller commented 3 years ago

When we added the index structure to tech.ml.datset (see https://github.com/techascent/tech.ml.dataset/pull/214), we prevented a column from returning an index if there are missing values in the column here. So this issue may not be relevant any more. There should always be an item in the first position because the column should not have any missing values.