Closed rickturner2001 closed 2 years ago
@eabase wow looks like you nailed it! I’ll have a look! Thanks!
@eabase for the FE stock, what is the trailimgPegRatio value seen? I’ll also check later on today
@eabase I am not sure that trailingPegRatio
is available for each symbol
I need to check more symbols though
It looks like the data structure(s) of
"trailingPegRatio":[{"dataId":14021,"asOfDate":"2021-12-10","periodType":"TTM","reportedValue":{"raw":3.0477,"fmt":"3.05"}}],
"quarterlyPegRatio":[{"dataId":14021,"asOfDate":"2020-09-30","periodType":"3M","reportedValue":{"raw":2.8618,"fmt":"2.86"}},
Are not picked up (i.e. not "collected" and provided) by the yfinance
infrastructure.
If that is the case this is a yfinance
issue. Hence I'll debug and open a pull request should I succeed in incorporating it.
I'll update. Looks like its solve-able and we'll get this sorted out soon
I'm on it now
@eabase , for the FE
symbol, the HTML (from def get_json(url, proxy=None, session=None)
within utils.py
))
html = session.get(url=url, proxies=proxy, headers=user_agent_headers).text
Will have no trailingPegRatio
in it so I assume you got it from another stage in the yfinance
run flow as you reported: key-statistics
I'll try to find it later on in the yfinance
flow
Indeed, as
# get fundamentals
data = utils.get_json(ticker_url + '/financials', proxy, self.session)
And for instance
# Analysis
data = utils.get_json(ticker_url + '/analysis', proxy, self.session)
We should add a New section with
# get key-statistics
data = utils.get_json(ticker_url + '/key-statistics', proxy, self.session)
I'll try to do that
@eabase unlike your nice curl
above, the python code doesn't "see" the trailingPegRatio
, which is strange:
I added the relevant code as can be seen, and trying to figure out what is missing. Maybe the suffix of ?p={FE}
so I'll try that now
@eabase I still cannot see the trailingPegRatio
element, super strange! Maybe you can help, perhaps I'm missing something here
As you noticed the stuff returned from a browser is completely different from that returned from some other tool. Most API are using referrer
and user_agent
headers to redirect your query. Since was just running the curl from command line, it uses the default user_agent
and probably no referrer. What is happening under the hood of bs4, urllib3, and requests or what have you, I don't know. So if you have a way to specify those headers then always do that.
I wanted to see the headers, but it's a bit funny that some curl commands (--head
) return a 404.
However, this one works:
# curl -s -D - -o /dev/null https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE}
HTTP/1.1 200 OK
referrer-policy: no-referrer-when-downgrade
strict-transport-security: max-age=15552000
x-frame-options: SAMEORIGIN
content-security-policy: sandbox allow-downloads allow-forms allow-modals allow-same-origin allow-scripts allow-popups allow-popups-to-escap
e-sandbox allow-top-navigation-by-user-activation allow-presentation;
content-type: text/html; charset=utf-8
vary: Accept-Encoding
set-cookie: B=du2uouhgroqnl&b=3&s=qo; expires=Fri, 17-Dec-2022 10:48:21 GMT; path=/; domain=.yahoo.com
date: Fri, 17 Dec 2021 10:48:21 GMT
x-envoy-upstream-service-time: 343
server: ATS
x-envoy-decorator-operation: finance-nodejs--mtls-production-ir2.finance-k8s.svc.yahoo.local:4080/*
Age: 0
Transfer-Encoding: chunked
Connection: keep-alive
Expect-CT: max-age=31536000, report-uri="http://csp.yahoo.com/beacon/csp?src=yahoocom-expect-ct-report-only"
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
# or better:
# curl -s -v -o /dev/null https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE}
...
Is there any documentation on how to use the yahoo api?
@eabase
For 2. Please see https://www.yahoofinanceapi.com/
@eabase these are the headers used by yfinance
user_agent_headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
To what should I replace them?
Hi guys! Sorry for late reply, but I'm in a TZ in GMT+2. Sadly the yahoo API doesn't really tell much... (and probably haven't been updated for along time.)
The curl
that gives the JSON part:
curl -s -A "curl/7.55.1" -H "Accept: application/json" -H "Content-Type: application/json" https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE} | grep "root.App.main = " | sed -e "s/root.App.main = //" |sed 's/.$//' >zzz.json
Here I had to separate the JSON part from the rest of html/js page code. I was hoping that the API would have had a way to only return the JSON, and not all the other crap. Using grep,sed is ugly and unreliable but works. To see JSON, open the zzz.json
in Firefox.
I tried pretty printing JSON using json_pp
, but fails as the options are mostly undocumented, and thus many fields gets corrupted. But if you wanna try, do:
cat zzz.json | json_pp -f json -t dumper -json_opt pretty,utf8,allow_bignum,relaxed
The 3 pegRatio
variants are found under QuoteSummaryStore
and QuoteTimeSeriesStore
:
Impressive catch @eabase and great job @asafravid. This might be a stupid question but is there a relation between the two ratios? if not, could it be possible that this is caused by typo on the source code of the API given the similarities in the two words?
the quarterlyPegRatio
consists of 4 values. But I don't know how to access the QuoteTimeSeriesStore in yfinance. I guess that's what need to be added? It would also be very useful for everyone to see what exact values are being divided to obtain these. I also noticed that the get_jason code is iterated 3 times, which allow to get "additional" pegRations when you just dump it out.
I also looked at the regex on some other scrapers, and they look a bit different, so it's possible that for the OP demo snippet, it just catches the 1st one. (The other 2 comes later.)
I was expecting so see something like:
data2 = _json.loads(json_str)['context']['dispatcher']['stores']['QuoteTimeSeriesStore']['timeSeries']
in there. No idea if that is correct, but some other issue in this repo is also related.
@asafravid Replace the user agent with curl/7.55.1
, but I think anything not found in a desktop or mobile browser would work, like my_own_agent
. You may also try something dirty for referrer such as https://yfapi.net
, httsp://finance.yahoo.com
or 127.0.0.1
. 🙈
Yet another minor curl improvement:
curl -s -A "curl/7.55.1" -H "Accept: application/json" -H "Content-Type: application/json" https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE} | grep "root.App.main = " | sed -e "s/root.App.main = //" |sed 's/.$//' |grep -ioE "[a-zA-Z_\"]*pegRatio.{120}"
Here:
grep "root.App.main = "
- find the relevant line with jsonsed -e "s/root.App.main = //"
- removes non json textsed 's/.$//'
- remove ;
on end of linegrep -ioE "[a-zA-Z_\"]*pegRatio.{120}"
- select only anything with pegRatio
in it.@eabase @rickturner2001 the only missing thing is that I fail to understand why we can't see those values in the HTML that yfinance
fetches from utils.get_json(ticker_url + '/key-statistics', proxy, self.session)
Without this data there, I don't know how to get it
Do you suggest running the curl
from the python code? I suppose its possible but wish we understand how to get it within the html
itself
@asafravid
I see the problem now. I have no idea what's going on. I can swear that the first time I found this, I was only adding a print statement to your code in utilities.py, like this:
If you put the following line:
print('\n------------------------------\n', new_data,'\n')
just after the line:
new_data = _json.dumps(data).replace('{}', 'null')
You will see 3 iterations with different output, when using your original/default user_Agent_header.
However, I no longer see the 3 different versions, instead now I only get:
# cat z1.txt |grep -ioE "[a-zA-Z_\"]*pegRatio.{120}"
"pegRatio": {"raw": -2.33, "fmt": "-2.33"}, "ytdReturn": null, "forwardPE": {"raw": 15.968128, "fmt": "15.97"}, "maxAge": 1, "las
"pegRatio": {"raw": 2.6497, "fmt": "2.65"}, "estimates": [{"period": "0q", "growth": {"raw": 0.248, "fmt": "0.25"}}, {"period": "
"pegRatio": null, "estimates": []}, "summaryDetail": {"previousClose": {"raw": 40.31, "fmt": "40.31"}, "regularMarketOpen": {"raw
@eabase cool I’ll give it a try and update results
@eabase I’ll also try your suggestion of new user agent header, thanks
See my update above.
@eabase meaning that it’s now either a user agent matter or using curl?
@asafravid
Not sure at this point. 😢
Is yfinance using POST or GET request? The curl above is a GET.
@eabase @asafravid Not sure how helpful this is but i actually downloaded a packet sniffer to run on the console, so that when i run a python script directly from the same console i can see all the HTTP requests. Here's a screen capture.
By the way the python script is as easy as this:
import yfinance as yf
yf.Ticker('AAPL').info['pegRatio']
only the third request has something to do with the peg ratio (i converted the HTML to JSON to take a good look at the values)
For some reason i cannot properly parse the HTML and JSON in for the third request.
so yfinance
is in fact using a GET request
@asafravid @rickturner2001
Nice catch! That is what I actually wanted to see in my very first post. Only I was not using the right tool to see it, as all my terminals was interpreting the results correctly. Because /
(\u002f
, or 0x2f
) has special meaning in HTML, you need to replace all those.
Can you scroll down to also show the quarterlyPegRatio
fields.
Can you run curl from the console with the sniffer as well? I'd like to also see the result from the line I used above.
https://github.com/httptoolkit/httptoolkit https://httptoolkit.tech/ https://amiusing.httptoolkit.tech/ - test if proxied by http toolkit
@eabase you mean the nice catch for @rickturner2001 :-)
I actually didn't see the quarterlyPegRatio
but I used yfinance
code as-is
I'm searching for a way to add python code which will also get the quarterlyPegRatio
, then I'll do a pull request, etc
Very useful & interesting info about caching: https://httptoolkit.tech/blog/status-targeted-caching-headers/
TTM
= Trailing Twelve Months.
There is actually no Quarterly PegRatio
. If you'd like to see for yourselves, here is the output
index.txt
As for the curl
request, yes i can make one but unfortunately we don't have the luxury of filtering the request. Basically even if we grep
a specific value we would still see the whole thing on the packet sniffer.
@eabase I assume you wanted me to curl
the following url: https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE} (taken from your curl
request above), if so, wouldn't it be wrong to do a GET request on this page given that the /key-statistics
are nowhere to be found in the request made by the API?
@rickturner2001 Then you're not doing something right, probably not looking in the right structure.
You can then see this in your browser:
or run this:
curl -s https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE} | grep "root.App.main = " | sed -e "s/root.App.main = //" |sed 's/.$//' |grep -ioE "[a-zA-Z_\"]*pegRatio.{120}"
# OUTPUT:
#"pegRatio":{"raw":-2.33,"fmt":"-2.33"},"ytdReturn":{},"forwardPE":{"raw":15.968128,"fmt":"15.97"},"maxAge":1,"lastCapGain":{},"sh
#"SCREENER_FIELD_pegratio_5y":"Price \u002F Earnings to Growth (P\u002FE\u002FG)","SCREENER_FIELD_peratio.annual":"Trailing P\u002FE (ANNUAL)","S
#"trailingPegRatio":[{"dataId":14021,"asOfDate":"2021-12-10","periodType":"TTM","reportedValue":{"raw":3.0477,"fmt":"3.05"}}],"quarterlyEn
#"quarterlyPegRatio":[{"dataId":14021,"asOfDate":"2020-09-30","periodType":"3M","reportedValue":{"raw":2.8618,"fmt":"2.86"}},{"dataId":1402
If you wanna find all of it in at a file, use this:
curl -s https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE} | grep "root.App.main = " | sed -e "s/root.App.main = //" |sed 's/.$//'> ok.json
Using that, I don't even need to specify a header or UA.
This is probably not helping either:
https://github.com/ranaroussi/yfinance/blob/main/yfinance/utils.py#L101-L104 # Only looking for summary doesn't allow different urls
https://github.com/ranaroussi/yfinance/blob/main/yfinance/utils.py#L108-L109 # Wrong path for other pegRatio's
@eabase But that's exactly what the problem is! We know for a fact that the data we're looking is located here: https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE}
But when i analyzed the requests made by the yfinance
API directly form the python script (if you recall i used a packet analyzer) the only requests registered were these:
finance.yahoo.com/quote/{SYMBOL}
finance.yahoo.com/quote/{SYMBOL}/holders
finance.yahoo.com/quote/{SYMBOL}/financials
finance.yahoo.com/quote/{SYMBOL}/analysis
we certainly cannot disreguard the fact that this url https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE}
is not in the requests made by the API, or can we?
TBH, I have no idea what the problem is. I just know someone need to rewrite (or modify) the yfinance API to allow for all the additional data structures, as discussed here. And if you look at the "click on this" URL I mentioned above, and you remove the period
s, or the query1
URL prefix, you'll get immediately redirected to the website without any JSON with that limited content. I think if you wanna use the API parts of yahoo, we should actually try to use their API to the best extent possible for free, even if not documented.
So you just need to decide if you wanna use yfianance to only be able to scrape the limited (and partially corrupt) HTML, or use the Yahoo API, to get the full JSON structure. Given the quantity of crap (ad code etc) in the HTML, my choice is simple.
For being Yahoo, they are doing an incredibly poor job offering any documentation to their data, and in particular their API. Adding the free API key is pretty much useless in this case, as 100 queries/day is ridiculously low.
@ranaroussi any ideas here?
@rickturner2001 even when I added it (https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE}) I didn't get the trailingPegRatio
there either
Cool, thanks @eabase i’ll try to run it and update
@eabase Looking good!
I'm opreparing a pull request for yfinance
Here's the code to add at line 532 of base.py
try:
my_headers = {'user-agent': 'curl/7.55.1', 'accept': 'application/json', 'content-type': 'application/json', 'referer': 'https://finance.yahoo.com/', 'cache-control': 'no-cache', 'connection': 'close'}
p = _re.compile(r'root\.App\.main = (.*);')
r = _requests.session().get('https://finance.yahoo.com/quote/{}/key-statistics?p={}'.format(self.ticker, self.ticker), headers=my_headers)
q_results = {}
my_qs_keys = ['pegRatio'] # QuoteSummaryStore
my_ts_keys = ['trailingPegRatio'] # , 'quarterlyPegRatio'] # QuoteTimeSeriesStore
# Complementary key-statistics
data = _json.loads(p.findall(r.text)[0])
key_stats = data['context']['dispatcher']['stores']['QuoteTimeSeriesStore']
q_results.setdefault(self.ticker, [])
for i in my_ts_keys:
# j=0
try:
# res = {i: key_stats['timeSeries'][i][1]['reportedValue']['raw']}
# We need to loop over multiple items, if they exist: 0,1,2,..
zzz = key_stats['timeSeries'][i]
for j in range(len(zzz)):
if key_stats['timeSeries'][i][j]:
res = {i: key_stats['timeSeries'][i][j]['reportedValue']['raw']}
q_results[self.ticker].append(res)
# print(res)
# q_results[ticker].append(res)
except:
q_results[ticker].append({i: np.nan})
res = {'Company': ticker}
q_results[ticker].append(res)
# if isinstance(data.get('trailingPegRatio'), dict):
# try:
# trailingPegRatio = _pd.DataFrame(data['trailingPegRatio'])
#
# self._trailingPegRatio = trailingPegRatio
except Exception:
pass
This is the pull request that provides the fix and thus provides 'trailingPegRatio' in the info
of the Ticker
object of yfinance
https://github.com/ranaroussi/yfinance/pull/911
Nice work @eabase and @rickturner2001 - much appreciated!
Submitted to my fork @ https://github.com/asafravid/yfinance
@asafravid Great! Glad I could be of help in the end.
The code I linked is quite dirty, so I'm sure it could be simplified in a more pythonic way. It would also be nice to be able to access the date and the period type.
@eabase yes you were of Tremendous help! You helped fixed this issue As for the quarterlyPegRatio and the dates, I added myself a TODO to add this important information as well In time... The acute ad-hoc bug was with the pegRatio showing an unexplained value, which trailingPegRatio Fixed completely Once the pull request is merged, I shall close this ticket, and open a new one for quarterlyPegRatio, which shall be based on your code as well with full credit of course, much appreciated!
Just a short followup. Did anyone figure out what values where used to calculate (the this / that) the 3 different pegRatios?
I didn’t dive into that @eabase , all I needed was the consensus correct value (as other websites calculate) so as to use it in my stock scanner core equation And for that - thanks for your help 🙏🏼
maybe you're confusing trailingPegRatio and pegRatio? it's way off tho, so might not be that quote: The "trailingPegRatio" uses the trailing P/E ratio, which is based on the stock's current price divided by its earnings per share (EPS) over the past 12 months. In contrast, the "pegRatio" uses the forward P/E ratio, which is based on the stock's current price divided by its estimated EPS over the next 12 months.
The Peg Ratio value for "FE", First Energy seems to be wrong, according to Google Finance the value is 2.99 but yfinance returns 43.37 on my script. This is how i got this value:
Output: 43.37
It works just fine with other stock which is why i don't really get what the problem seems to be.