joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.
http://joshuaulrich.github.io/xts/
GNU General Public License v2.0
219 stars 71 forks source link

merge.xts works with cbind better than c #211

Closed isomorphisms closed 2 years ago

isomorphisms commented 7 years ago

Description

merge.xts( cbind( GLD, GDAXI ) ) works but merge.xts( c( GLD,GDAXI ) ) doesn't.

Expected behavior

output of c version should match or even supersede the cbind version

Minimal, reproducible example

require(quantmod)
getSymbols( c('GLD','^GDAXI') )
merge.xts( c( GLD,GDAXI ), join='inner' )

Session Info

R version 3.4.1 (2017-06-30)
Platform: i686-pc-linux-gnu (32-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/libf77blas.so.3.0
LAPACK: /opt/R-3.4.1/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] quantmod_0.4-10 TTR_0.23-2      xts_0.10-0      zoo_1.8-0       magrittr_1.5    colorout_1.1-2 

loaded via a namespace (and not attached):
 [1] compiler_3.4.1   pryr_0.1.2       tools_3.4.1      curl_2.7         Rcpp_0.12.12     stringi_1.1.5   
 [7] codetools_0.2-15 grid_3.4.1       stringr_1.2.0    lattice_0.20-35 
joshuaulrich commented 7 years ago

The behavior you expect would be inconsistent with c.ts() and c.zoo(). c.xts() is synonymous with rbind.xts().

It would be very helpful if you could describe how you arrived at the conclusion that the "output of c version should match or even supersede the cbind version", with references to documentation, if possible. That would help me make it more clear.

isomorphisms commented 7 years ago

OK, good to know. I was expecting c to be behave more like cbind than rbind (or even be smart enough to "know what I mean" --- if cing the same symbol, then rbind. if cing different symbols, then cbind.

I would be happy to rewrite the c.xts function to do this.

joshuaulrich commented 7 years ago

Thanks for the offer to contribute! Your suggested change would still make c.xts() inconsistent with c.zoo() method, so I would be very reluctant to include the change.

Also, I generally try to stop and reconsider my approach if I think a function needs to try to be "smart enough" to guess the user's intention. That usually means something about the existing workflow and/or functions is not clear and explicit enough.

isomorphisms commented 7 years ago

Why? It makes sense to me to have an alias up front with case logic depending on eg the input type.

For example evince 'http://www.maths.ed.ac.uk/~aar/papers/eilestee.pdf' vs evince eilestee.pdf knows to use a socket in the first case. Or stringr provides a front end to stringi.

I will look more carefully at c.zoo().

joshuaulrich commented 7 years ago

Different behavior based on input type is fine. That's how method dispatch works and is a well-established pattern. stringr providing an interface to stringi is also fine. That's essentially a map to stringi functionality, and interfaces are another well-established pattern.

The evince example isn't quite the same as what you propose for c.xts(). The former is more like method dispatch. You know that a string starting with "http://" is a URL, which requires a different access method than reading from the file system.

Different behavior based on "different symbols" is not a well-established pattern. I'm not even sure what you mean by "different symbols". Are you referring to "symbol" in the sense of ticker symbol, name bound to an object, something else? Maybe your proposal will make more sense if I understand that better...

isomorphisms commented 7 years ago

yes, ticker symbol (as in quantmod::getSymbols, not ld linker symbols).

If I want to merge.xts( GSPC, TNX ) then a cbind seems to me more appropriate. If I want to merge.xts( GSPC['2010/'], GSPC['2011'] ), then an rbind would make more sense to me.

joshuaulrich commented 7 years ago

Okay, I now understand the use-case. I'm not convinced it's a good idea, however. But I do think it is a bad idea to put it in merge.xts(), because merge.xts() provides merge/join functionality, not union functionality. And it is good practice to have one function do one thing.

Even if the functionality you propose were put into a new function, I think it would produce behavior that would surprise users. There are many conditions that need to be met for a successful rbind() and some are unclear/ambiguous. Does "same symbol" mean "same name bound to an R object", or can it mean "data for the same symbol, regardless of R object name"? What should the function do if the objects have the "same symbol" but different number/order of columns, or different column names?

Implementation would also require parsing the function call in order to determine whether to do a cbind() or rbind(), and that has additional unclear/ambiguous conditions (e.g. if the arguments passed via ... have tags/names).

In short, the additional work and potential surprising behavior are too costly for me to justify a new function that avoids requiring the user to write explicit code when they want a merge/join versus a union.

Also note it's not good practice to call methods directly; you should call merge(GSPC, TNX) and let S3 method dispatch determine what method to use.

joshuaulrich commented 2 years ago

I'm closing this because it's unlikely to result in a code change, but I'm happy to continue the discussion.