ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
12.44k stars 2.24k forks source link

Peg Ratio value way off #903

Closed rickturner2001 closed 2 years ago

rickturner2001 commented 2 years ago

The Peg Ratio value for "FE", First Energy seems to be wrong, according to Google Finance the value is 2.99 but yfinance returns 43.37 on my script. This is how i got this value:

import yfinance as yf

symbol = yf.Ticker("FE")

infos = (symbol.info)

print(infos['pegRatio'])

Output: 43.37

It works just fine with other stock which is why i don't really get what the problem seems to be.

asafravid commented 2 years ago

@eabase wow looks like you nailed it! I’ll have a look! Thanks!

asafravid commented 2 years ago

@eabase for the FE stock, what is the trailimgPegRatio value seen? I’ll also check later on today

asafravid commented 2 years ago

@eabase I am not sure that trailingPegRatio is available for each symbol I need to check more symbols though

asafravid commented 2 years ago

It looks like the data structure(s) of

"trailingPegRatio":[{"dataId":14021,"asOfDate":"2021-12-10","periodType":"TTM","reportedValue":{"raw":3.0477,"fmt":"3.05"}}],
"quarterlyPegRatio":[{"dataId":14021,"asOfDate":"2020-09-30","periodType":"3M","reportedValue":{"raw":2.8618,"fmt":"2.86"}}, 

Are not picked up (i.e. not "collected" and provided) by the yfinance infrastructure. If that is the case this is a yfinance issue. Hence I'll debug and open a pull request should I succeed in incorporating it. I'll update. Looks like its solve-able and we'll get this sorted out soon I'm on it now

asafravid commented 2 years ago

@eabase , for the FE symbol, the HTML (from def get_json(url, proxy=None, session=None) within utils.py))

html = session.get(url=url, proxies=proxy, headers=user_agent_headers).text

Will have no trailingPegRatio in it so I assume you got it from another stage in the yfinance run flow as you reported: key-statistics

I'll try to find it later on in the yfinance flow

asafravid commented 2 years ago

Indeed, as

    # get fundamentals
    data = utils.get_json(ticker_url + '/financials', proxy, self.session)

And for instance

    # Analysis
    data = utils.get_json(ticker_url + '/analysis', proxy, self.session)

We should add a New section with

    # get key-statistics
    data = utils.get_json(ticker_url + '/key-statistics', proxy, self.session)

I'll try to do that

asafravid commented 2 years ago

@eabase unlike your nice curl above, the python code doesn't "see" the trailingPegRatio, which is strange:

I added the relevant code as can be seen, and trying to figure out what is missing. Maybe the suffix of ?p={FE} so I'll try that now

image

asafravid commented 2 years ago

@eabase I still cannot see the trailingPegRatio element, super strange! Maybe you can help, perhaps I'm missing something here

image

eabase commented 2 years ago

As you noticed the stuff returned from a browser is completely different from that returned from some other tool. Most API are using referrer and user_agent headers to redirect your query. Since was just running the curl from command line, it uses the default user_agent and probably no referrer. What is happening under the hood of bs4, urllib3, and requests or what have you, I don't know. So if you have a way to specify those headers then always do that.

I wanted to see the headers, but it's a bit funny that some curl commands (--head) return a 404.
However, this one works:

# curl -s -D - -o /dev/null https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE}

HTTP/1.1 200 OK
referrer-policy: no-referrer-when-downgrade
strict-transport-security: max-age=15552000
x-frame-options: SAMEORIGIN
content-security-policy: sandbox allow-downloads allow-forms allow-modals allow-same-origin allow-scripts allow-popups allow-popups-to-escap
e-sandbox allow-top-navigation-by-user-activation allow-presentation;
content-type: text/html; charset=utf-8
vary: Accept-Encoding
set-cookie: B=du2uouhgroqnl&b=3&s=qo; expires=Fri, 17-Dec-2022 10:48:21 GMT; path=/; domain=.yahoo.com
date: Fri, 17 Dec 2021 10:48:21 GMT
x-envoy-upstream-service-time: 343
server: ATS
x-envoy-decorator-operation: finance-nodejs--mtls-production-ir2.finance-k8s.svc.yahoo.local:4080/*
Age: 0
Transfer-Encoding: chunked
Connection: keep-alive
Expect-CT: max-age=31536000, report-uri="http://csp.yahoo.com/beacon/csp?src=yahoocom-expect-ct-report-only"
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff

# or better:
# curl -s -v -o /dev/null https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE}

...

Is there any documentation on how to use the yahoo api?

asafravid commented 2 years ago
  1. I’ll try with your headers - thanks
  2. I’ll send you a link to the yahoo API it should be somewhere on the yahoo website and/or internet
asafravid commented 2 years ago

@eabase

For 2. Please see https://www.yahoofinanceapi.com/

asafravid commented 2 years ago

@eabase these are the headers used by yfinance

user_agent_headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

To what should I replace them?

eabase commented 2 years ago

Hi guys! Sorry for late reply, but I'm in a TZ in GMT+2. Sadly the yahoo API doesn't really tell much... (and probably haven't been updated for along time.)

The curl that gives the JSON part:

curl -s -A "curl/7.55.1" -H "Accept: application/json" -H "Content-Type: application/json" https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE} | grep "root.App.main = " | sed -e "s/root.App.main = //" |sed 's/.$//' >zzz.json 

Here I had to separate the JSON part from the rest of html/js page code. I was hoping that the API would have had a way to only return the JSON, and not all the other crap. Using grep,sed is ugly and unreliable but works. To see JSON, open the zzz.json in Firefox.

I tried pretty printing JSON using json_pp, but fails as the options are mostly undocumented, and thus many fields gets corrupted. But if you wanna try, do:

cat zzz.json | json_pp -f json -t dumper -json_opt pretty,utf8,allow_bignum,relaxed

zzz.zip

The 3 pegRatio variants are found under QuoteSummaryStore and QuoteTimeSeriesStore:

Click to see picture ![image](https://user-images.githubusercontent.com/52289379/146651528-263af984-de4b-4bae-9a54-595df6f4156a.png) --- ![image](https://user-images.githubusercontent.com/52289379/146676509-ec8b96a1-3972-4fe1-9268-89a0e9da4dfa.png)
rickturner2001 commented 2 years ago

Impressive catch @eabase and great job @asafravid. This might be a stupid question but is there a relation between the two ratios? if not, could it be possible that this is caused by typo on the source code of the API given the similarities in the two words?

eabase commented 2 years ago

the quarterlyPegRatio consists of 4 values. But I don't know how to access the QuoteTimeSeriesStore in yfinance. I guess that's what need to be added? It would also be very useful for everyone to see what exact values are being divided to obtain these. I also noticed that the get_jason code is iterated 3 times, which allow to get "additional" pegRations when you just dump it out.

I also looked at the regex on some other scrapers, and they look a bit different, so it's possible that for the OP demo snippet, it just catches the 1st one. (The other 2 comes later.)

I was expecting so see something like:
data2 = _json.loads(json_str)['context']['dispatcher']['stores']['QuoteTimeSeriesStore']['timeSeries'] in there. No idea if that is correct, but some other issue in this repo is also related.

eabase commented 2 years ago

@asafravid Replace the user agent with curl/7.55.1, but I think anything not found in a desktop or mobile browser would work, like my_own_agent. You may also try something dirty for referrer such as https://yfapi.net, httsp://finance.yahoo.com or 127.0.0.1. 🙈

Yet another minor curl improvement:

curl -s -A "curl/7.55.1" -H "Accept: application/json" -H "Content-Type: application/json" https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE} | grep "root.App.main = " | sed -e "s/root.App.main = //" |sed 's/.$//' |grep -ioE "[a-zA-Z_\"]*pegRatio.{120}"

Here:

asafravid commented 2 years ago

@eabase @rickturner2001 the only missing thing is that I fail to understand why we can't see those values in the HTML that yfinance fetches from utils.get_json(ticker_url + '/key-statistics', proxy, self.session)

Without this data there, I don't know how to get it Do you suggest running the curl from the python code? I suppose its possible but wish we understand how to get it within the html itself

eabase commented 2 years ago

@asafravid

I see the problem now. I have no idea what's going on. I can swear that the first time I found this, I was only adding a print statement to your code in utilities.py, like this:

If you put the following line:

print('\n------------------------------\n', new_data,'\n')

just after the line:

new_data = _json.dumps(data).replace('{}', 'null')

You will see 3 iterations with different output, when using your original/default user_Agent_header.

However, I no longer see the 3 different versions, instead now I only get:

# cat z1.txt  |grep -ioE "[a-zA-Z_\"]*pegRatio.{120}"

"pegRatio": {"raw": -2.33, "fmt": "-2.33"}, "ytdReturn": null, "forwardPE": {"raw": 15.968128, "fmt": "15.97"}, "maxAge": 1, "las
"pegRatio": {"raw": 2.6497, "fmt": "2.65"}, "estimates": [{"period": "0q", "growth": {"raw": 0.248, "fmt": "0.25"}}, {"period": "
"pegRatio": null, "estimates": []}, "summaryDetail": {"previousClose": {"raw": 40.31, "fmt": "40.31"}, "regularMarketOpen": {"raw
asafravid commented 2 years ago

@eabase cool I’ll give it a try and update results

asafravid commented 2 years ago

@eabase I’ll also try your suggestion of new user agent header, thanks

eabase commented 2 years ago

See my update above.

asafravid commented 2 years ago

@eabase meaning that it’s now either a user agent matter or using curl?

eabase commented 2 years ago

@asafravid
Not sure at this point. 😢 Is yfinance using POST or GET request? The curl above is a GET.

rickturner2001 commented 2 years ago

@eabase @asafravid Not sure how helpful this is but i actually downloaded a packet sniffer to run on the console, so that when i run a python script directly from the same console i can see all the HTTP requests. Here's a screen capture.

By the way the python script is as easy as this:

import yfinance as yf
yf.Ticker('AAPL').info['pegRatio']

Screenshot from 2021-12-18 20-31-38

only the third request has something to do with the peg ratio (i converted the HTML to JSON to take a good look at the values)

For some reason i cannot properly parse the HTML and JSON in for the third request.

Screenshot from 2021-12-18 20-50-21

so yfinance is in fact using a GET request

eabase commented 2 years ago

@asafravid @rickturner2001 Nice catch! That is what I actually wanted to see in my very first post. Only I was not using the right tool to see it, as all my terminals was interpreting the results correctly. Because / (\u002f, or 0x2f) has special meaning in HTML, you need to replace all those.

Can you scroll down to also show the quarterlyPegRatio fields.

Can you run curl from the console with the sniffer as well? I'd like to also see the result from the line I used above.

https://github.com/httptoolkit/httptoolkit https://httptoolkit.tech/ https://amiusing.httptoolkit.tech/ - test if proxied by http toolkit

asafravid commented 2 years ago

@eabase you mean the nice catch for @rickturner2001 :-)

I actually didn't see the quarterlyPegRatio but I used yfinance code as-is I'm searching for a way to add python code which will also get the quarterlyPegRatio, then I'll do a pull request, etc

eabase commented 2 years ago

Very useful & interesting info about caching: https://httptoolkit.tech/blog/status-targeted-caching-headers/

eabase commented 2 years ago

TTM = Trailing Twelve Months.

rickturner2001 commented 2 years ago

There is actually no Quarterly PegRatio. If you'd like to see for yourselves, here is the output index.txt

As for the curl request, yes i can make one but unfortunately we don't have the luxury of filtering the request. Basically even if we grep a specific value we would still see the whole thing on the packet sniffer.

@eabase I assume you wanted me to curl the following url: https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE} (taken from your curl request above), if so, wouldn't it be wrong to do a GET request on this page given that the /key-statistics are nowhere to be found in the request made by the API?

eabase commented 2 years ago

@rickturner2001 Then you're not doing something right, probably not looking in the right structure.

Click on this: https://query1.finance.yahoo.com/ws/fundamentals-timeseries/v1/finance/timeseries/FE?symbol=FE&type=quarterlyPegRatio&period1=493590046&period2=1913180947

You can then see this in your browser:

image


or run this:

curl -s https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE} | grep "root.App.main = " | sed -e "s/root.App.main = //" |sed 's/.$//' |grep -ioE "[a-zA-Z_\"]*pegRatio.{120}"

# OUTPUT: 

#"pegRatio":{"raw":-2.33,"fmt":"-2.33"},"ytdReturn":{},"forwardPE":{"raw":15.968128,"fmt":"15.97"},"maxAge":1,"lastCapGain":{},"sh
#"SCREENER_FIELD_pegratio_5y":"Price \u002F Earnings to Growth (P\u002FE\u002FG)","SCREENER_FIELD_peratio.annual":"Trailing P\u002FE (ANNUAL)","S
#"trailingPegRatio":[{"dataId":14021,"asOfDate":"2021-12-10","periodType":"TTM","reportedValue":{"raw":3.0477,"fmt":"3.05"}}],"quarterlyEn
#"quarterlyPegRatio":[{"dataId":14021,"asOfDate":"2020-09-30","periodType":"3M","reportedValue":{"raw":2.8618,"fmt":"2.86"}},{"dataId":1402

If you wanna find all of it in at a file, use this:

curl -s https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE} | grep "root.App.main = " | sed -e "s/root.App.main = //" |sed 's/.$//'> ok.json

Using that, I don't even need to specify a header or UA.

This is probably not helping either:

https://github.com/ranaroussi/yfinance/blob/main/yfinance/utils.py#L101-L104 # Only looking for summary doesn't allow different urls

https://github.com/ranaroussi/yfinance/blob/main/yfinance/utils.py#L108-L109 # Wrong path for other pegRatio's

rickturner2001 commented 2 years ago

@eabase But that's exactly what the problem is! We know for a fact that the data we're looking is located here: https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE}

But when i analyzed the requests made by the yfinance API directly form the python script (if you recall i used a packet analyzer) the only requests registered were these:

finance.yahoo.com/quote/{SYMBOL} finance.yahoo.com/quote/{SYMBOL}/holders finance.yahoo.com/quote/{SYMBOL}/financials finance.yahoo.com/quote/{SYMBOL}/analysis

we certainly cannot disreguard the fact that this url https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE} is not in the requests made by the API, or can we?

eabase commented 2 years ago

TBH, I have no idea what the problem is. I just know someone need to rewrite (or modify) the yfinance API to allow for all the additional data structures, as discussed here. And if you look at the "click on this" URL I mentioned above, and you remove the periods, or the query1 URL prefix, you'll get immediately redirected to the website without any JSON with that limited content. I think if you wanna use the API parts of yahoo, we should actually try to use their API to the best extent possible for free, even if not documented.

So you just need to decide if you wanna use yfianance to only be able to scrape the limited (and partially corrupt) HTML, or use the Yahoo API, to get the full JSON structure. Given the quantity of crap (ad code etc) in the HTML, my choice is simple.

For being Yahoo, they are doing an incredibly poor job offering any documentation to their data, and in particular their API. Adding the free API key is pretty much useless in this case, as 100 queries/day is ridiculously low.

asafravid commented 2 years ago

@ranaroussi any ideas here?

asafravid commented 2 years ago

@rickturner2001 even when I added it (https://finance.yahoo.com/quote/{FE}/key-statistics?p={FE}) I didn't get the trailingPegRatio there either

eabase commented 2 years ago

It's working just fine! Check out my gist HERE.

asafravid commented 2 years ago

Cool, thanks @eabase i’ll try to run it and update

asafravid commented 2 years ago

@eabase Looking good! I'm opreparing a pull request for yfinance image

Here's the code to add at line 532 of base.py

    try:
        my_headers = {'user-agent': 'curl/7.55.1', 'accept': 'application/json', 'content-type': 'application/json', 'referer': 'https://finance.yahoo.com/', 'cache-control': 'no-cache', 'connection': 'close'}
        p = _re.compile(r'root\.App\.main = (.*);')
        r = _requests.session().get('https://finance.yahoo.com/quote/{}/key-statistics?p={}'.format(self.ticker, self.ticker), headers=my_headers)
        q_results = {}
        my_qs_keys = ['pegRatio']  # QuoteSummaryStore
        my_ts_keys = ['trailingPegRatio']  # , 'quarterlyPegRatio']  # QuoteTimeSeriesStore

        # Complementary key-statistics
        data = _json.loads(p.findall(r.text)[0])
        key_stats = data['context']['dispatcher']['stores']['QuoteTimeSeriesStore']
        q_results.setdefault(self.ticker, [])
        for i in my_ts_keys:
            # j=0
            try:
                # res = {i: key_stats['timeSeries'][i][1]['reportedValue']['raw']}
                # We need to loop over multiple items, if they exist: 0,1,2,..
                zzz = key_stats['timeSeries'][i]
                for j in range(len(zzz)):
                    if key_stats['timeSeries'][i][j]:
                        res = {i: key_stats['timeSeries'][i][j]['reportedValue']['raw']}
                        q_results[self.ticker].append(res)

            # print(res)
            # q_results[ticker].append(res)
            except:
                q_results[ticker].append({i: np.nan})

        res = {'Company': ticker}
        q_results[ticker].append(res)
    # if isinstance(data.get('trailingPegRatio'), dict):
    #         try:
    #             trailingPegRatio = _pd.DataFrame(data['trailingPegRatio'])
    #
    #             self._trailingPegRatio = trailingPegRatio
    except Exception:
        pass
asafravid commented 2 years ago

This is the pull request that provides the fix and thus provides 'trailingPegRatio' in the info of the Ticker object of yfinance https://github.com/ranaroussi/yfinance/pull/911

Nice work @eabase and @rickturner2001 - much appreciated!

asafravid commented 2 years ago

Submitted to my fork @ https://github.com/asafravid/yfinance

eabase commented 2 years ago

@asafravid Great! Glad I could be of help in the end.

The code I linked is quite dirty, so I'm sure it could be simplified in a more pythonic way. It would also be nice to be able to access the date and the period type.

image

asafravid commented 2 years ago

@eabase yes you were of Tremendous help! You helped fixed this issue As for the quarterlyPegRatio and the dates, I added myself a TODO to add this important information as well In time... The acute ad-hoc bug was with the pegRatio showing an unexplained value, which trailingPegRatio Fixed completely Once the pull request is merged, I shall close this ticket, and open a new one for quarterlyPegRatio, which shall be based on your code as well with full credit of course, much appreciated!

eabase commented 2 years ago

Just a short followup. Did anyone figure out what values where used to calculate (the this / that) the 3 different pegRatios?

asafravid commented 2 years ago

I didn’t dive into that @eabase , all I needed was the consensus correct value (as other websites calculate) so as to use it in my stock scanner core equation And for that - thanks for your help 🙏🏼

FinestMaximus commented 3 months ago

maybe you're confusing trailingPegRatio and pegRatio? it's way off tho, so might not be that quote: The "trailingPegRatio" uses the trailing P/E ratio, which is based on the stock's current price divided by its earnings per share (EPS) over the past 12 months. In contrast, the "pegRatio" uses the forward P/E ratio, which is based on the stock's current price divided by its estimated EPS over the next 12 months.