joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.
http://joshuaulrich.github.io/xts/
GNU General Public License v2.0
220 stars 71 forks source link

Empty return when comparing 2 xts variables #352

Closed HLatte closed 3 years ago

HLatte commented 3 years ago

Description

When comparing 2 xts created by quantmod it returns an "empty" variable. As per below (return from the minimal example):

head(temp) logical(0) str(temp) logi(0)

  • attr(, "index")= num(0) ..- attr(, "tzone")= chr "" ..- attr(*, "tclass")= chr [1:2] "POSIXct" "POSIXt"

The only difference I can spot is that the yahoo xts index type is: Indexed by objects of class: [Date] TZ: UTC while the tiingo index is: Indexed by objects of class: [POSIXct,POSIXt] TZ:

Otherwise, everything seems to be similar.

Expected behavior

expected to return TRUE or FALSE if the data for each column and each date are the same or not, respectively.

Minimal, reproducible example

NOTE: replace/add tiingo's API key.

require("xts", "quantmod")
start_date <- "2021-04-05"

# call to yahoo's API
amd_yahoo <- getSymbols( Symbols = "AMD",  src = "yahoo",  from = start_date,  auto.assign = FALSE,  return.class = "xts")
# remove de adjusted column
amd_yahoo$AMD.Adjusted <- NULL

# call to tiingo's API
amd_tiingo <- getSymbols(  Symbols = "AMD",  src = "tiingo",  from = start_date,  auto.assign = FALSE,  return.class = "xts",
  api.key = "xxxxxx"  #don't forget to replace with your Tiingo's API key 
)

# compare both tables/variables
temp <- amd_yahoo == amd_tiingo

# check result
print(head(temp))
print(str(temp))

Session Info

R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tictoc_1.0.1               openxlsx_4.2.3             tidyquant_1.0.3           
 [4] PerformanceAnalytics_2.0.4 lubridate_1.7.10           forcats_0.5.1             
 [7] stringr_1.4.0              dplyr_1.0.5                purrr_0.3.4               
[10] readr_1.4.0                tidyr_1.1.3                tibble_3.1.1              
[13] ggplot2_3.3.3              tidyverse_1.3.1            here_1.0.1                
[16] quantmod_0.4.18            TTR_0.24.2                 xts_0.12.1                
[19] zoo_1.8-9                 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6       lattice_0.20-41  assertthat_0.2.1 rprojroot_2.0.2  utf8_1.2.1       R6_2.5.0        
 [7] cellranger_1.1.0 backports_1.2.1  reprex_2.0.0     httr_1.4.2       pillar_1.6.0     rlang_0.4.10    
[13] curl_4.3         readxl_1.3.1     rstudioapi_0.13  munsell_0.5.0    tinytex_0.31     broom_0.7.6     
[19] compiler_4.0.5   modelr_0.1.8     xfun_0.22        pkgconfig_2.0.3  tidyselect_1.1.0 quadprog_1.5-8  
[25] fansi_0.4.2      crayon_1.4.1     dbplyr_2.1.1     withr_2.4.2      Quandl_2.10.0    grid_4.0.5      
[31] jsonlite_1.7.2   gtable_0.3.0     lifecycle_1.0.0  DBI_1.1.1        pacman_0.5.1     magrittr_2.0.1  
[37] scales_1.1.1     zip_2.1.1        cli_2.4.0        stringi_1.5.3    fs_1.5.0         xml2_1.3.2      
[43] ellipsis_0.3.1   generics_0.1.0   vctrs_0.3.7      tools_4.0.5      glue_1.4.2       hms_1.0.0       
[49] colorspace_2.0-0 rvest_1.0.0      haven_2.4.0    
braverock commented 3 years ago

This is not a bug.

==  

is an exact comparison. Aside from the index, the two data sets also have different column names. How do you expect == an R primitive, to match these things up?

I suggest that you compare the numeric values of the columns, after normalizing the index.

joshuaulrich commented 3 years ago

I agree with @braverock that this isn't a bug, so there isn't anything we should change in xts. You can use all.equal() to see the differences between the two objects.

R$ all.equal(amd_tiingo, amd_yahoo)
[1] "Attributes: < Component \"index\": Attributes: < Component \"tclass\": Lengths (2, 1) differ (string compare on first 1) > >"
[2] "Attributes: < Component \"index\": Attributes: < Component \"tclass\": 1 string mismatch > >"                                
[3] "Attributes: < Component \"index\": Attributes: < Component \"tzone\": 1 string mismatch > >"                                 
[4] "Attributes: < Component \"index\": Mean relative difference: 1.111932e-05 >"                                                 
[5] "Attributes: < Component \"src\": 1 string mismatch >"                                                                        
[6] "Attributes: < Component \"updated\": Mean absolute difference: 46.64628 >"                                                   
[7] "Mean relative difference: 0.0002269158" 

That said, it may be an issue/infelicity with getSymbols.tiingo() when it returns daily data. Other getSymbols() methods return daily data with a Date index, not a POSIXct index. And (as you noticed), that's what causes the issue.

HLatte commented 3 years ago

Thank you both. Indeed the problem is with the index. As suggested by braverock, a workaround is to normalize the index. This can be done with the following action before the comparison:

index(amd_tiingo) <- as.Date( index(amd_tiingo), tz="")

The catch is that you must use the parameter tz="" in as.Date(). Otherwise, as.Date() will mess up with the timezone, and the dates are shifted one day backwards. This solution was in one of the comments of this question on Stackoverflow.

I hope this will help somebody in the future.

HL