joshuaulrich / quantmod

Quantitative Financial Modelling Framework
http://www.quantmod.com/
GNU General Public License v3.0
794 stars 219 forks source link

different data returned by the API? #398

Closed Courvoisier13 closed 11 months ago

Courvoisier13 commented 11 months ago

Description

The yahoo API is returning to be giving different adjusted data to what is shown on the website?

Here is an example with RSPT, RSPS vs QQQ. image You see here RSPT blowing past QQQ and RSPS even ending at the same point as QQQ.

For comparaison here is what the yahoo website shows: image

and koyfin: image

and morningstar image

Any clue on what is going on here? I have the latest version of quantmod. I also use yfinance (just updated to latest version) on python and it's the same issue. Is there an issue with the API?

Expected behavior

correct adjusted close prices, similar to the yahoo website.

Minimal, reproducible example

library(tidyverse)
library(plotly)
library(tidyquant)

assets = c('RSPT', 'QQQ', 'RSPS')
enddate = today(tzone = "EST")
startdate = enddate - years(5)
prices_yahoo = tq_get(assets, from = startdate, to = enddate,  get = "stock.prices")

prices_yahoo = prices_yahoo %>%
  group_by(symbol) %>%
  mutate(ret = RETURN(adjusted, fill_na = 0),
         perf = CUMULATIVE_PRODUCT(1+ret))

prices_yahoo %>% ggplot(aes(date, perf, color = symbol)) + geom_line() + ggtitle("yahoo")

Session Info

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tidyquant_1.0.7            quantmod_0.4.24            TTR_0.24.3                 PerformanceAnalytics_2.0.4
 [5] xts_0.13.1                 zoo_1.8-12                 plotly_4.10.2              lubridate_1.9.2           
 [9] forcats_1.0.0              stringr_1.5.0              dplyr_1.1.2                purrr_1.0.1               
[13] readr_2.1.4                tidyr_1.3.0                tibble_3.2.1               ggplot2_3.4.2             
[17] tidyverse_2.0.0           

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0    viridisLite_0.4.2   timeDate_4022.108   farver_2.1.1        fastmap_1.1.1      
 [6] lazyeval_0.2.2      digest_0.6.33       rpart_4.1.19        timechange_0.2.0    lifecycle_1.0.3    
[11] ellipsis_0.3.2      survival_3.5-5      magrittr_2.0.3      compiler_4.3.1      rlang_1.1.1        
[16] tools_4.3.1         yaml_2.3.7          utf8_1.2.3          data.table_1.14.8   labeling_0.4.2     
[21] htmlwidgets_1.6.2   curl_5.0.1          withr_2.5.0         nnet_7.3-19         grid_4.3.1         
[26] fansi_1.0.4         timetk_2.8.3        colorspace_2.1-0    future_1.33.0       globals_0.16.2     
[31] scales_1.2.1        MASS_7.3-60         cli_3.6.1           generics_0.1.3      rstudioapi_0.15.0  
[36] future.apply_1.11.0 httr_1.4.6          tzdb_0.4.0          DBI_1.1.3           splines_4.3.1      
[41] parallel_4.3.1      vctrs_0.6.3         hardhat_1.3.0       Matrix_1.5-4.1      jsonlite_1.8.7     
[46] hms_1.1.3           listenv_0.9.0       crosstalk_1.2.0     gower_1.0.1         recipes_1.0.6      
[51] glue_1.6.2          parallelly_1.36.0   codetools_0.2-19    Quandl_2.11.0       rsample_1.1.1      
[56] stringi_1.7.12      gtable_0.3.3        quadprog_1.5-8      munsell_0.5.0       pillar_1.9.0       
[61] furrr_0.3.1         htmltools_0.5.5     ipred_0.9-14        lava_1.7.2.1        R6_2.5.1           
[66] lattice_0.21-8      class_7.3-22        Rcpp_1.0.11         prodlim_2023.03.31  pkgconfig_2.0.3
joshuaulrich commented 11 months ago

Could you please provide a minimal example that only uses quantmod? I'm not familiar with any of the packages you've used, and I'd rather not have to figure out how they're using quantmod behind the scenes.

Courvoisier13 commented 11 months ago

Is the below better? I am only using tidyverse and dropped tidyquant (wrapper around quantmod). If yes, I can edit the question and replace the reproducible example.

library(tidyverse)
library(quantmod)
assets = c('RSPT', 'QQQ', 'RSPS')
enddate = today(tzone = "EST")
startdate = enddate - years(5)
getSymbols(Symbols = assets)

RSPT = dailyReturn(RSPT$RSPT.Adjusted)
QQQ = dailyReturn(QQQ$QQQ.Adjusted)
RSPS = dailyReturn(RSPS$RSPS.Adjusted)

RSPT = as.data.frame(RSPT) %>% rownames_to_column(var = "date") %>% select(date, daily.returns) %>% mutate(ticker = "RSPT")
QQQ = as.data.frame(QQQ) %>% rownames_to_column(var = "date") %>% select(date, daily.returns) %>% mutate(ticker = "QQQ")
RSPS = as.data.frame(RSPS) %>% rownames_to_column(var = "date") %>% select(date, daily.returns) %>% mutate(ticker = "RSPS")

prices_yahoo = RSPT %>% bind_rows(QQQ) %>% bind_rows(RSPS)
prices_yahoo = prices_yahoo %>% mutate(date = as.Date(date))

cumprod_ignore_na <- function(x) {
  y = cumprod(replace(x, is.na(x), 1)) ; y[is.na(x)] <- NA
  return(y)
}

prices_yahoo = prices_yahoo %>%
  group_by(ticker) %>%
  mutate(perf = cumprod_ignore_na(1+daily.returns))

prices_yahoo %>% ggplot(aes(date, perf, color = ticker)) + geom_line() + ggtitle("yahoo")
prices_yahoo %>% 
  filter(date>startdate) %>%
  group_by(ticker) %>%
  mutate(perf = cumprod_ignore_na(1+daily.returns)) %>% 
  ggplot(aes(date, perf, color = ticker)) + geom_line() + ggtitle("yahoo")
joshuaulrich commented 11 months ago

Thanks for the simpler example. tidyverse and ggplot2 include ~40 other packages and attach 9. Attaching tidyverse also attaches dplyr, and that breaks how the base R lag() function works. Issues like that are why I ask for minimal examples.

Here's what I did (for future me, not my expectation from you)

library(quantmod)
assets <- c('RSPT', 'QQQ', 'RSPS')
getSymbols(assets, from = "2018-08-01", to = "2023-08-01", env = (e <- new.env()))
p <- do.call(merge, lapply(e, Cl))
wp <- cumprod(1+ROC(p, 1, "discrete")[-1])
plot(wp, main = "wealth index from close prices")
addLegend("topleft", lty = 1)

a <- do.call(merge, lapply(e, Ad))
wa <- cumprod(1+ROC(a, 1, "discrete")[-1])
plot(wa, main = "wealth index from adjusted prices")
addLegend("topleft", lty = 1)

wealth index from close prices

wealth index from adjusted prices

So the charts from the websites are using the plain close prices, while your code uses the adjusted close prices. The adjusted prices give you a more accurate result based on what you'd actually receive if you had owned those securities over the time horizon.

I'm going to close this because it's not an issue with quantmod. But I'm happy to continue the discussion if you'd like.

Courvoisier13 commented 11 months ago

thanks @joshuaulrich . the websites are using adjusted. that one I am 100% sure. morningstar even has the option to display growth adjusted or not adjusted. The second graph you show is for sure wrong, do it over 10 years. you see RSPT is up 1100% (vs QQQ 400% - there is no world where the equal weight is that much different from the cap weighted). I think there is an issue with the API endpoint from yahoo. I will do the adjustments myself and see what we get.

Courvoisier13 commented 11 months ago

just checked. getDividends("RSPT") returns dividends that are 10x what they should be. seems like a yahoo issue. not quantmod. there was a split last week. I thin that triggered the problem.

joshuaulrich commented 11 months ago

Yeah, they've frequently changed the data they return. There were times where the regular OHLC prices were adjusted for splits but not dividends; times where they weren't adjusted for either; times where the volume wasn't adjusted for splits and/or dividends, etc. It's just a mess.

I use Tiingo @tiingo instead of Yahoo because of this (I'm not compensated for mentioning them). Here's what their adjusted prices show for the past 5 years (2018-01-01/2023-08-01).

library(quantmod)
assets <- c('RSPT', 'QQQ', 'RSPS')
getSymbols(assets, from = "2018-08-01", to = "2023-08-01", src = "tiingo", env = (e <- new.env()))
p <- do.call(merge, lapply(e, Ad))
wp <- cumprod(1+ROC(p, 1, "discrete")[-1])-1
plot(wp, main = "wealth index from Tiingo adjusted close prices")
addLegend("topleft", lty = 1)

image

joshuaulrich commented 11 months ago

dividends that are 10x what they should be

...so they're not adjusting dividends for splits... or doing it wrong.

People have suggested that quantmod should try to return the correct data regardless of what Yahoo returns, but that would be a lot of work for me. And I'd prefer people use a different public data source instead of trying to keep up with Yahoo's changes.

Courvoisier13 commented 11 months ago

you mean do the adjustments in the package? will investigate if that is practical. pb is as you point what happens when they correct the issue. there does not seem to be a rule. one day they adjust, one day they dont.