joshuaulrich / quantmod

Quantitative Financial Modelling Framework
http://www.quantmod.com/
GNU General Public License v3.0
794 stars 219 forks source link

Repeatedly querying for adjusted close prices gives different values #416

Closed AntonioFasano closed 1 month ago

AntonioFasano commented 1 month ago

Description

Calling getSymbols() twice or more gives different price adjusted values.

Minimal, reproducible example

Below, I ask four times for MCD adjusted close prices from 2000-01-01 to 2000-01-11:

library('quantmod')
replicate(4, getSymbols.yahoo("MCD", from = "2000-01-01", to = "2000-01-11", auto.assign = FALSE)$MCD.Adjust |> as.character())
     [,1]               [,2]               [,3]               [,4]              
[1,] "21.7877388000488" "21.7877368927002" "21.7877368927002" "21.7877388000488"
[2,] "21.3409824371338" "21.3409767150879" "21.3409767150879" "21.3409824371338"
[3,] "21.6846370697021" "21.6846351623535" "21.6846351623535" "21.6846370697021"
[4,] "21.3753471374512" "21.3753471374512" "21.3753471374512" "21.3753471374512"
[5,] "21.9251937866211" "21.9252033233643" "21.9252033233643" "21.9251937866211"
[6,] "22.0282955169678" "22.028299331665"  "22.028299331665"  "22.0282955169678"

Note that only column 1 is equal to column 4, and columns 2 is equal to column a3.

Repeating the experiment may bring more or less similarity.
Changing dates or symbol did not show any difference.
Using long data sets and with multiple stocks, the observed differences can be significant, therefore rounding doesn't help.

I have read a number of stories regarding Yahoo data, so the cause can be on their side.
Be it as it may, if this behaviour is confirmed, it would make it impossible to carry out replicable studies.

Expected behaviour

Columns in the output above should be identical.

Session Info

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libblas.so.3.12.0 
LAPACK: /usr/lib/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Helsinki
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.4.0
braverock commented 1 month ago

Yahoo's data and API is notoriously inconsistent/unreliable/bad. We only use it for demos because it is universally available without API keys or logins.

If you want to carry out replicable studies, I'm afraid you need a commercial or academic data source.

If you want a free source, maybe try quandl or alpha vantage? Both require an account and API keys, but both have free tiers.

joshuaulrich commented 1 month ago

I agree with @braverock. You have to be very cautious with Yahoo data.

I use tiingo data (@tiingo). Their free tier is generous and their paid tiers are reasonably priced. Most importantly, their data quality is much better than Yahoo. I've also heard good things about Polygon for intraday data. I'm not paid to recommend either of these products.

I'm closing this because it's not something I can fix in quantmod.