Look for date/time column in as.xts.data.frame()

joshuaulrich commented 1 year ago

It would be nice if as.xts.data.frame() didn't require date/times as row names, and instead looked for a time-based column in the data.frame if the row names weren't valid index values.

For example:

d <- structure(list(
    Date = structure(c(13515, 13516, 13517, 13518, 13519, 13520), class = "Date"),
    Open =  c(50.03, 50.23, 50.42, 50.37, 50.24, 50.13),
    High =  c(50.11, 50.42, 50.42, 50.37, 50.24, 50.21),
    Low =   c(49.95, 50.23, 50.26, 50.22, 50.11, 49.99),
    Close = c(50.11, 50.39, 50.33, 50.33, 50.18, 49.99)),
    row.names = c(NA, 6L), class = "data.frame")
as.xts(d)
##             Open  High   Low Close
## 2007-01-02 50.03 50.11 49.95 50.11
## 2007-01-03 50.23 50.42 50.23 50.39
## 2007-01-04 50.42 50.42 50.26 50.33
## 2007-01-05 50.37 50.37 50.22 50.33
## 2007-01-06 50.24 50.24 50.11 50.18
## 2007-01-07 50.13 50.21 49.99 49.99

joshuaulrich commented 1 year ago

@zeileis, @ggrothendieck: does zoo do anything like this currently? I know it would be harder because zoo doesn't have to be indexed by time. Maybe it could look for one column that was a different type than the rest in the data.frame?

ggrothendieck commented 1 year ago

read.zoo has an index= argument which defaults to 1 so for the example read.zoo(d) works. Most data sets have the index column first so there is really little advantage to automatically locating it.

joshuaulrich commented 1 year ago

Thanks for your thoughts! read.zoo() is actually one of the things that made me think of this (and good point about the index usually being in the first column).

I really like that read.zoo() provides this functionality. However, I don't think most users think of using a 'read' function to convert a data frame to a zoo/xts object. I imagine their first guess would be to use as.zoo() or as.xts(). That's my rationale for this enhancement.

zeileis commented 1 year ago

I agree with you. However, the as.zoo.data.frame() behaves like as.ts() on a data.frame. So we don't have any detection of potential index columns there. The question is whether it is worth changing the current behavior for this. What we could add in a fully backward-compatible way is:

as.zoo.data.frame(x, index = NULL, ...)

where index = NULL means that no index is provided explicitly (and 1, ..., n is used implicitly). This would give the users at least an option to do as.zoo(df, index = 1) but without any autodetection.

If we were willing to possibly break existing code we could interpret index = NULL as "please autodetect the index". Possible interpretations would be:

Use the first column.
Use the first column but only if it is not plain numeric or integer.
Use the first non-plain numeric/integer column.
Use the first and only non-plain numeric/integer column.

But I'm not sure if any of these really make life easier for existing zoo users. I would probably just add the argument without any auto-detection.

ggrothendieck commented 1 year ago

as.zoo.data.frame could be a wrapper around read.zoo with the appropriate default arguments to ensure backwards compatibility.

joshuaulrich / xts

Look for date/time column in as.xts.data.frame() #381