joshuaulrich / rfimport

getSymbols() reboot
16 stars 2 forks source link

What to return when importing one symbol? #9

Closed joshuaulrich closed 2 years ago

joshuaulrich commented 2 years ago

I find it annoying to have to subset the object returned from import_ohlc() when you only import one symbol. For example.

library(rfimport)
## list with one element
spy_list <- import_ohlc(sym_yahoo("SPY"))
## just the SPY data
spy <- spy_list[[1]]

It would be nice if import_ohlc() returned just an xts object if you only provide a single symbol spec object. But I'm not sure how I feel about import_ohlc() returning a different type/class of object depending on how many symbols you provide.

I'd appreciate thoughts/suggestions.

ethanbsmith commented 2 years ago

i prefer consistency. plenty of functions that work like this. eg: strsplit i find it easy to use this pattern when i know i only have 1 item, but do admit, sometime i forget and don't do it upfront spy <- import_ohlc(sym_yahoo("SPY"))[[1]]

another option would be a simplify parameter. that way people could change the default on it to suit their style, similar to how autoassign on getSymbols

joshuaulrich commented 2 years ago

Another thought I had was to make the multiple_ohlc object look and act like a data.frame with a Symbol column. For example:

print2 <-
function(x, ..., n = 5)
{
    for (nm in names(x)) {
        y <- x[[nm]]
        nry <- NROW(y)
        ncy <- NCOL(y)
        ibeg <- seq.int(1L, n, 1L)
        iend <- seq.int(nry-n+1L, nry, 1L)

        if (nry > (n*2+1)) {
            index_str <- as.character(index(y))
            index_str <- c(index_str[ibeg],
                           "...",
                           index_str[iend])

            ybeg <- y[ibeg,]
            yend <- y[iend,]
            out <- rbind(data.frame(Symbol = nm, ybeg),
                         matrix(NA, 1L, ncy + 1L, dimnames = list("...", c("Symbol", colnames(y)))),
                         data.frame(Symbol = nm, yend))

            out <- data.frame(` ` = rownames(out), out, check.names = FALSE)
            out <- capture.output(print(out, row.names = FALSE))
            # n+2 to account for column names in first element
            out[n+2] <- gsub("<?NA>?", "  ", out[n+2])
            # remove leading space to match print.xts()
            out <- sub("^ ", "", out)
        } else {
            out <- capture.output(print(data.frame(Symbol = nm, y)))
        }

        write(out, "")
        cat("\n")
    }
    invisible(x)
}

print2(spy_list)
##            Symbol   Open   High    Low  Close    Volume Adjusted
## 2021-05-10    SPY 422.50 422.74 417.81 417.94  81852400 412.5116
## 2021-05-11    SPY 413.10 415.27 410.06 414.21 116888000 408.8300
## 2021-05-12    SPY 411.23 412.59 404.00 405.41 134811000 400.1443
## 2021-05-13    SPY 407.07 412.35 407.02 410.28 106394000 404.9510
## 2021-05-14    SPY 413.21 417.49 413.18 416.58  82201600 411.1692
##        ...                                                    
## 2022-05-02    SPY 412.07 415.92 405.02 414.48 158312500 414.4800
## 2022-05-03    SPY 415.01 418.93 413.36 416.38 100028200 416.3800
## 2022-05-04    SPY 417.08 429.66 413.71 429.06 144247900 429.0600
## 2022-05-05    SPY 424.55 425.00 409.44 413.81 172929100 413.8100
## 2022-05-06    SPY 411.10 414.80 405.73 411.34 151671300 411.3400

But that would make the supporting code for the multiple_ohlc objects quite a bit more complicated... I'm not sure it's worth the ease-of-use tradeoff.


EDIT: I'm not even sure it would work. Something like spy_list$Symbol would return a vector of "SPY", but what would spy_list$Open return? It would be weird/wrong if the result didn't have the Symbol column any more, but then it's not an xts object... it would need to be a subclass of xts... which would be a lot more work.

pverspeelt commented 2 years ago

i prefer consistency. plenty of functions that work like this. eg: strsplit i find it easy to use this pattern when i know i only have 1 item, but do admit, sometime i forget and don't do it upfront spy <- import_ohlc(sym_yahoo("SPY"))[[1]]

another option would be a simplify parameter. that way people could change the default on it to suit their style, similar to how autoassign on getSymbols

I agree with @ethanbsmith. Consistency is key. People will use this in scripts and you don't want to have different behaviour if there is one symbol or if there are many symbols in the returned object.

As in your thought about printing multiple_ohlc, I like that it prints only the head and tail parts, but you miss the name of the list object, like printing a normal list. That should be enough to differentiate.

joshuaulrich commented 2 years ago

Thanks for your thoughts @ethanbsmith and @pverspeelt! I also think consistency is very important, but I didn't say it strongly in my initial comment because I didn't want to sway other's replies. That's the main thing that keeps me from making this change.

I'm still annoyed though, so I need a solution. :)

What about another function? Either a new function that returns an xts object (import_ohlc_xts()), or make import_ohlc() return an xts object and create a new function (import_multi_ohlc()) that returns a multiple_ohlc object. I lean toward the latter because then the function's name is similar to the data the function returns.

Whatever we do here, we should do something similar with import_series() (imports a single series, e.g. 10y Treasury rates from FRED) and import_quote() in order to differentiate between importing a single symbol source versus importing multiple series from one or more sources at the same time.

Thoughts?


..., but you miss the name of the list object, like printing a normal list.

Fixed in https://github.com/joshuaulrich/rfimport/commit/81f470067d926aef839394b59b3211f3c6532521, thanks!

ethanbsmith commented 2 years ago

i like that. a wrapper function that simplifies output is a pretty common pattern: sapply() vs. lapply() xml2::xml_find_first() vs. xml2::xml_find_all()

joshuaulrich commented 2 years ago

I just pushed a commit that makes import_ohlc() return an xts object and adds import_multi_ohlc() to return a 'multiple_ohlc' object (like import_ohlc() had done).

joshuaulrich commented 2 years ago

Closing, because I merged the commit to master.