Open ancapdev opened 4 months ago
I do appreciate the design choice of having the user define the date-time (sorting/matching) column but this is one of those assumptions (having Index
as the index column) which provides certainty and somewhat easier maintenance of the TSFrames functions.
One can have:
struct TSFrame
coredata :: DataFrame
Index :: String
end
The constructors can default to the name Index
in absence of a provided index column (the current behaviour).
Having said that, a lot of code will need to change, and, yes, many other assumptions will also need to be thought about again.
Meanwhile, would it to be possible for your package to compose with a TSFrame
and an index string in the package struct
? Would that solve your immediate problem?
Hi, thanks for replying.
In my use case a lot of the end processing happens on the underlying data frame (coredata
) directly, so that's the crux of the issue. I need to preserve the column names in these. For now I'm going with plain DataFrame
objects, and in the future we'll either develop our own time series wrapper, or see if TSFrames can move towards an API that doesn't touch the underlying data.
I understand. As I said, it will be useful to have this flexibility in the package. I will keep this issue open for now, open for someone to pick it up, submit a PR.
Is there appetite to change the API for
TSFrame
so it stores the name of the index column, preserving the source dataframe, rather than replacing the column with a new namedIndex
even when user specified?For context, I'm building a time series system with streaming and batch APIs. In my system the user defines schemas for their time series, these schemas include the time field/column, and preserving the names of fields/columns throughout consistently is important for my use case. The current
TSFrame
API makes that awkward and I don't want to let theTSFrames
column name override govern downstream design and naming decisions.At a more fundamental level what I would expect
TSFrame
to be is a pure semantic layer that verifies time ordering of rows in dataframes, guaranteeing that invariant to functions operating on time series, without changing the underlying data the way it currently does.Now that the design is burned in, I appreciate it may not be possible to change it without breaking assumptions in dependent code, but I thought asking is worth it.