Closed rhamo closed 2 years ago
More testing carried out on yfR's Weekly and Monthly price data. Open Price always seems to be problematic. High, Low and Close seem to be stable.
Thanks @rhamo I need some time to investigate this. I find it very weird the data doesnt match between yfR and BatchGetSymbols. Did you execute the code in the same day?
Yes all executed within minutes of each other. Always seems to be incorrect Open price when using freq_data = Weekly, Monthly and Yearly. I’m not sure how yfR calculates the open price compared to getsymbols but have compared yfR to getSymbols & BatchGetSymbols today using 20 tickers within the S&P 500 and again all open prices are wrong.
So the problem is just for open prices when aggregating? -- yfR calculates this by picking the first/last (see arg how_to_aggregate
) date for week/monthly/year.
Can you share some code so that we are looking at the same issue?
It also happened to me that the closing price of a stock did not correspond to the real price. Hours later I ran it again and it was solved
Here's the code I am using to compare yfR and BatchGetSymbols Weekly O/H/L/C price data.
BatchGetSymbols price data is most accurate when using: how.to.aggregate = "last" for Weekly data.
I have discovered the following with yfR: When how_to_aggregate = “first” all the Open prices are incorrect. High/Low and Close prices are correct. When how_to_aggregate = “last” all the Close prices are incorrect. High/Low and Open prices are correct.
BatchGetSymbols Code - Weekly - aggregate "last"
library(rvest)
library(xts)
library(zoo)
library(TTR)
library(xml2)
library(dplyr)
library(BatchGetSymbols)
library(shiny)
library(miniUI)
library(shinyFiles)
us.data6 <- BatchGetSymbols(tickers = c("AAPL"),
first.date = Sys.Date()-70,
last.date = Sys.Date(),
freq.data = 'weekly',
how.to.aggregate = "last",
do.complete.data = FALSE,
thresh.bad.data = 0.75,
cache.folder = file.path(tempdir(),
'BGS_Cache') ) # cache in tempdir()
us.data.out6 <- as_tibble(us.data6$df.tickers)
View(us.data.out6)
yfR Code - Weekly - aggregate "last"
library(yfR)
tickers <- c("AAPL")
first_date <- Sys.Date() - 70
last_date <- Sys.Date()
df_yf <- yf_get(
tickers = tickers,
first_date = first_date,
last_date = last_date,
freq_data = "weekly",
bench_ticker = "^GSPC",
how_to_aggregate = "last",
do_complete_data = FALSE,
thresh_bad_data = 0.75,
do_cache = TRUE)
View(df_yf)
yfR Code - Weekly - aggregate "first"
library(yfR)
tickers <- c("AAPL")
first_date <- Sys.Date() - 70
last_date <- Sys.Date()
df_yf <- yf_get(
tickers = tickers,
first_date = first_date,
last_date = last_date,
freq_data = "weekly",
bench_ticker = "^GSPC",
how_to_aggregate = "first",
do_complete_data = FALSE,
thresh_bad_data = 0.75,
do_cache = TRUE)
View(df_yf)
thanks @rhamo
Code wise, yes, there were some changes in between BatchGetSymbols and yfR, especially for aggregated data. My advice, stick with yfR which is the package I'll maintain moving forward.
As for the difference of open prices, the raw data comes from Yahoo finance and I have no control over its quality. So I cannot check (or fix) anything related to raw data.
I'm closing this one as I dont see what I can do to improve the raw data.
I completely understand that you have no control over yahoo price data but is it possible to return the "aggregated data" code back to the BatchGetSymbols code for yfR? I assume people use this to retrieve stock O/H/L/C data to match what they see on all published stock candlestick charts? I believe Yahoo's data is correct but it all comes down to how I word my script for "how_to_aggregate" = "first" or "last" which I believe is incorrect. Please know I do appreciate all your work @msperlin I just don't understand why the Open and Close price can be so wrong due to the way I need to write my script.
The issue is that in BatchGetsymbols, the option "how_to_aggregate" = "first" was yielding the first Maximum in the column high_price (maximum of first day of year, for example), while yfR uses the maximum value whithin the interval (maximum highest price within the year), which is the right calculation.
btw, you can see that in the code of yf_get
:
I see what you are saying but today have discovered exactly what is happening!
In this example I am using yfR and ticker = AAPL for trading Week 2022-06-21 (AAPL traded from Tuesday 2022-06-21 to Friday 2022-06-24). Note: Monday (2022-06-20) was a Public Holiday and the Market was closed* The code I am using is the same code I have posted above.
When using: freq_data = “daily” with either how_to_aggregate = “first” or “last”, the Open, High, Low and Close price data is 100% correct. When using: : freq_data = “weekly” with how_to_aggregate = “first”, the Open, High and Low price data is 100% correct but the Close Price Data is incorrect. When using: : freq_data = “weekly” with how_to_aggregate = “last”, the High, Low and Close price data is 100% correct but the Open Price Data is incorrect.
I have mentioned this previously but please read further.
This Week (2022-06-21) traded from 2022-06-21 to 2022-06-24 as stated above. So normally;
But what I have discovered: is when using: yfR freq_data = “weekly” and how_to_aggregate = “first” is:
And when using: yfR freq_data = “weekly” and how_to_aggregate = “last”:
I am sorry I’m not great with putting this into coding language but for some reason when using freq_data = “weekly” the Open and Close is using the actual Daily price of the Weekly start date (2022-06-21) instead of using the first day (2022-06-21) and last day (2022-06-24) of the Week.
Possibly the code you have shared above may have to be altered for how_to_aggregate = “first”? Maybe price_open may have to = first(price_open) and price_close may have to = last(price_close). I’m not sure. Unless its an issue with the freq_data code?
Refer to images below, how_to_aggregate = "last" code in yfR is different to BatchGetSymbols. I believe the code used for how_to_aggregate = "last" , price.open and price.close prices in BatchGetSymbols is the correct code.
price.open = first(price.open)
price.close = last(price.close)
yfR
BatchGetSymbols
Thanks @rhamo. The code in yfR looks fine to me. I believe you can have it both ways, depending on your data needs for high/open/low prices.
If you really need a different type of aggregation offered by yfR, you can download the daily data and aggregate outside of yf_get.
Hi msperlin,
I have been running multiple tests using individual, multiple and collection "SP500" tickers with yfR and found random discrepancies when comparing price data to Yahoo Finance Website (Historical Price Data) and BatchGetSymbols price data.
This example was conducted on AAPL for Weekly price data for the last 3 weeks.
These errors have also been discovered when using "freq_data" daily, weekly and monthly across multiple tickers and in this example alone appears to be an issue with the Open price. When I carried this test on all SP500 tickers approx 100 tickers has issues with the price data.
Please note in this example I have cross-checked AAPL weekly price data with stock charting software. The Yahoo Finance website and BatchGetSymbols is correct. yfR is incorrect.
Here is the R script I used for yfR:
Kind regards, Ron