ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
13k stars 2.3k forks source link

Exception: yfinance failed to decrypt Yahoo data response #1407

Closed robertluisw closed 1 year ago

robertluisw commented 1 year ago

Looks like more encryption issues from yahoo.com

import yfinance as yf ticker = 'PENN' stock_info = yf.Ticker(ticker).balance_sheet

Exception: yfinance failed to decrypt Yahoo data response

[ Basically affects everything except price history @ValueRaider ]

Using Python version 3.11.0 yf version 0.2.9

@ValueRaider hijacking top post

[2023-06-23] Update! Latest release fixes financials tables (and removes decryption code).

What is happening? In December 2022 Yahoo began encrypting webpage data, maybe to block scraping. Now, Yahoo is regularly changing their encryption key, we think every day (and maybe multiple times a day). Without an automated system to extract key from their webpage (work in progress), fixing decryption requires a volunteer to manually extract the new key and provide to developers to upload to yfinance.

~Help needed~

~Need a Javascript dev to write a script that extract AES decryption key from obfuscated JS that Yahoo uses to en/decrypt. The key is there plaintext, just need to automate extraction. The JS changes every day so limited scope to hardcode (use Git branch hotfix/decryption to print today's JS url). Don't worry about sandboxing etc, end users won't execute this.~

~Script should be separate to yfinance codebase. I expect your only interaction with yfinance is testing the extracted key works by putting in yfinance/data.py~

~Useful comments:~

Progress updates

2023-06-21

Update your yfinance! Latest release fixes financials tables and removes decryption code.

2023-06-04

Obvious that the decryption won't be fixed. See last message for plan.

2023-03-25

Ticker.info fixed by fetching from API. Financials still broken.

2023-02-17

Yahoo finally started using a new encryption key not in yfinance backup list of keys, so decryption failing. Inevitable. Surprised it took 4 days.

2023-02-13

What is the "backup decryption method"? This is simply yfinance fetching decryption keys from this GitHub project website instead of extracting from Yahoo.com. Was broken in 0.2.9 but fixed in 0.2.10. Today worked for many thanks to a key uploaded yesterday. Discussion continues on a decent system for extracting & sharing decryption key.

workaround - yahooquery

Python module yahooquery is a functional alternative to yfinance. Instead of scraping webpages it accesses Yahoo's undocumented API. Not encrypted and faster, but lacks earnings_dates. GitHub Documentation

ValueRaider commented 1 year ago

Agreed. I noticed 12 hours ago Yahoo was more sensitive to spam, but only now a total block.

FYI I've just released 0.2.10 which fixes the backup decrypt methods but doesn't help (I hoped it would), so don't feel pressured to upgrade 0.2.9. Unless you want to debug and fix, then definitely upgrade.

ValueRaider commented 1 year ago

If you came to report same issue, just upvote the top comment. Keep this thread clean and constructive.

ValueRaider commented 1 year ago

Don't see any obvious change to dict structure - still 10004 extra items just like before. Maybe they've upgraded their obfuscation from simply changing key to changing other encryption parameters.

This is the Javascript we think they use to encrypt: https://s.yimg.com/uc/finance/dd-site/js/main.e0c853d8cea2b75a5208.min.js Reading compressed Javascript not my expertise, maybe someone can extract the encryption parameters and cross-check against yfinance/data.py::decrypt_cryptojs_aes_stores()

jasmohan-narula commented 1 year ago

Why are the comments being deleted?

ValueRaider commented 1 year ago

@jasmohan-narula Because all they essentially say is "I have issue too", contributing nothing. Thread quickly gets messy, some of us want to discuss problem.

aetmezgu commented 1 year ago

Hello, I have investigated and I noticed different things during my testing using : python test_yfinance.py (Script tested in a python 3.11.2 docker)

_get_decryption_keys_from_yahoo_js(self, soup) always return an empty array of keys for me and I get the error : WARNING: No decryption keys could be extracted from JS file. Falling back to backup decrypt methods.

For function _get_decryption_keys_from_yahoo_js in data.py, line 218 : if len(sub_keys) == key_count: => always return FALSE for me because key_count == 4 and len(sub_keys) always return 10004 for me, so the script never execute the code inside the if since last yahoo changes ?

So I tried to make this if work and I replace the instruction before : sub_keys = key_list[ind+1:]

To : sub_keys = key_list[len(key_list)-4:] => To really take the last 4 keys as explained in the comment of the first attempt

And the method now return the concatenate result of the last 4 keys :

# Gather decryption keys:
        soup = BeautifulSoup(response.content, "html.parser")
        keys = self._get_decryption_keys_from_yahoo_js(soup)
        print(keys) => ['2ecbf885a68605aaf0ee8a8b9529fc80c6458ff25278cb981aa69b8103c18471c9219387b538643252eea3e8938c99b078e05ff7589994b974efc3fa8fcf505b']

I guess the code can now try to decrypt the store with the non-empty keys : stores = decrypt_cryptojs_aes_stores(data, keys) But I'm still getting the exception : Exception: yfinance failed to decrypt Yahoo data response

When decrypt_cryptojs_aes_stores(data, keys) is called ....

It seems that the keys contained in the plugin object doesn't work anymore?

I hope it helped, I'll try go deeper in the code to see what makes the decryption failed.

ValueRaider commented 1 year ago

I've created branch hotfix/decryption for people to collab on. You'll still need to Pull Request but I'll merge with minimal review - proper review can happen later. Just make sure your fork is on that branch not main.

annis-souames commented 1 year ago

Since the encryption method changed from Yahoo Finance's backend side, does this mean that all of yfinance package is not usable, not even the previous versions ?

jessysu commented 1 year ago

@snowgato Off topic but it works. Just pip upgrade your requests and urllib3. https://github.com/dpguthrie/yahooquery/issues/143

Meborl commented 1 year ago

The json loaded from root.App.main always comprises 10004 key/value pairs, but simply joining the last 4 values is no longer working.

The password needed to disentangle "stores" is generated by a javascript function supplied in "main.xxxxxxxxxxxxxxxxxx.modern.js". The version of this file is indicated by the hash "xxxxxxxxxxxxxxxxxx". The javascript code in this file changes with every version and seems to be heavily obfuscated. I got the same version of "main.xxxxxxxxxxxxxxxxxx.modern.js" for all pages I called on the same day, and another version on the next day. All pages delivered with a certain version of "main.xxxxxxxxxxxxxxxxxx.modern.js" are including the same 10004 key/value pairs in root.App.main, but the order of these 10004 key/values is changed with each page call.

I loaded a stock page in a webbrowser and then opened the inspection console (F12). After setting a breakpoint in "main.xxxxxxxxxxxxxxxxxx.modern.js" I could scrap the password from an internal variable. The password is still a concatenate of 4 of the values comprised in root.App.main and it is 128 bytes long. After manually copying the password into python code, I could read the "stores" dict.

The javascript code in "main.xxxxxxxxxxxxxxxxxx.modern.js" is obfuscated. Variable and function names seem to change in diffrenet versions. The decryption of the json string is done in this function call:

return s.context.dispatcher.stores=JSON.parse(function(e,t){return c().decrypt(e,t).toString(...

In this case, a variable named "e" is holding the entangled content of "stores" and a variable name "t" comprising the 128 bytes password. This password can be used to decrypt the "stores" in all pages delivered with that particular version of "main.xxxxxxxxxxxxxxxxxx.modern.js".

I have no idea, how to automate the generation of the password with "main.xxxxxxxxxxxxxxxxxx.modern.js". Maybe someone experienced in javascript will find a solution.

The way Yahoo is wrapping their data is by no means proper encryption. It is just a kind of obfuscation by misusing standard functions from cryptography.

SymbReprUnlim commented 1 year ago

ValueRaider - Treating human beings like filtered out list elements by deleting those attempts at being helpful is not a good long-term policy, even if it helps better focus on some technical issue at hand. What's needed is a way for you to add tags to certain posts that YOU consider most relevant, so that you and others can view the list of posts you consider most relevant to solving the technical issue(s) at hand. As right as you may be in deleting those posts, it is an infringement on free speech, human cognition and a total teamwork approach. GITHUB apparently needs a software modification to allow you the capability to tag and filter while still allowing people to contribute, without being deleted; except for rudeness, crassness, deliberate attempts at software sabotage, etc. deletions still being helpful or some sort of auto-rudeness filtering as allowable. 3-5 days now and still no clear-cut solution, maybe Yahoo is doing what it is doing purposefully, for a reason. There's always the SEC and direct access to its database, XBRL's, financial statements etc. Google Finance has closed its previously open doors to web scraping. There are other potential scraping alternatives, Zack's, MarketWatch, ForExFactory... Feel free to delete this post after reading it and giving pause for thought. And I'm not suggesting giving up on a technical solution to the current apparent encryption inability to access data with YahooFinance.

JECSand commented 1 year ago

@Meborl @ValueRaider I've come to the conclusion that the only way to do this in a worthwhile manner is by executing the JS code itself. I'm looking at js2py and PyMiniRacer.

Of the two, my preference would be to find a solution using js2py as in this guide: https://devpress.csdn.net/python/630502f87e6682346619d3dc.html

PyMiniRacer has a lot more overhead and doesn't seem as stable.

There's just no point spending hours rewriting their smoke and mirrors logic in Python, only for them to change a few mirrors around and break it.

valankar commented 1 year ago

Did someone look into how steampipe is doing it before deleting my comment? Is that also using the "hidden" API (also mentioned in a deleted comment). How is that worse than what yfinance is doing?

ValueRaider commented 1 year ago

@SymbReprUnlim This is not a platform for free speech. This is a platform for constructive collaboration, and that requires moderation. We don't need dozens of "I also have this error" replies - imo this is software sabotage. The few that want to contribute shouldn't have to sift through many useless comments.

3-5 days now and still no clear-cut solution

I missed the part where you paid for yfinance and we are paid to fix this. Some people have already volunteered time and effort with very useful debugging, and now a solution appears visible.

@valankar Maybe I should have explained. The steampipe example is less useful and harder to install than Python yahooquery, already a great example of using the "hidden" API.

Rogach commented 1 year ago

@JECSand Are you sure that js2py is securely sandboxed (cannot access any sensitive functionality on the host system)? Because otherwise we will be basically deploying RCE vulnerability to all the users of the library, which is quite suboptimal.

And if it is securely sandboxed, then we have a whole new can of worms - such execution environment will be trivially detectable. The code then will be able to do various shenanigans - from semi-harmless endless loops to randomizing the data if interpreter is detected.

Unfortunately executing JS is not a final solution either :(

ValueRaider commented 1 year ago

@Rogach Good points. Given key doesn't change within a single day, maybe some volunteer can setup & run a separate service that regularly runs this JS to extract decryption key then post somewhere public e.g. a separate GitHub project? yfinance already capable of fetching keys from GitHub HTML (the "backup decrypt" method), can easily redirect.

galashour commented 1 year ago

@Rogach Good points. Given key doesn't change within a single day, maybe some volunteer can setup & run a separate service that regularly runs this JS to extract decryption key then post somewhere public e.g. a separate GitHub project? yfinance already capable of fetching keys from GitHub HTML (the "backup decrypt" method), can easily redirect.

Just a clarification: Even if we find such a volunteer, would it imply yfinance users will need to 'pip update yfinance' every day (/few days) in order to have the updated keys?

domsde commented 1 year ago

Just a clarification: Even if we find such a volunteer, would it imply yfinance users will need to 'pip update yfinance' every day (/few days) in order to have the updated keys?

No, with one of the updates in the last couple of weeks a new way was introduced that he was referring to as backup decrypt. It is basically a textfile with keys. That file is already loaded through the regular yfinance code and can therefore be modified online without the need of updating yfinance.

ValueRaider commented 1 year ago

@domsde Correct. Currently yfinance can ping GitHub for new keys, but uploading new keys is manual process - not good when key changes daily. Just need one PIP update to change where yfinance pings.

ChristianKuehnel commented 1 year ago

If you're looking for a place to execute potentially unsafe code: GitHub actions is a nice place for this. You can also directly store the output of your pipeline in Github again...

ValueRaider commented 1 year ago

@ChristianKuehnel Thanks for info, I'll try to speak with @ranaroussi.

Seems the backup decrypt is working today. Anyone disagree? Because I'm curious if Yahoo uses different key for different regions.

galashour commented 1 year ago

@ChristianKuehnel Thanks for info, I'll try to speak with @ranaroussi.

Seems the backup decrypt is working today. Anyone disagree? Because I'm curious if Yahoo uses different key for different regions.

yes - now it works (backup decrypt). [middle-east (geography)].

khalidcruz commented 1 year ago

Works for me here in Sweden.

0zd3m1r commented 1 year ago

Works in Germany.

ValueRaider commented 1 year ago

So Europe good. America? East Asia?

Luiscas97 commented 1 year ago

Works in south America.

kryp33 commented 1 year ago

US looks good at 9:15EST (I had a problem before with "info" field. It's populated now)

ValueRaider commented 1 year ago

@pchedas Please no more confirmations from Europe.

divityodaplus commented 1 year ago

works for me here in India

aliahmed2001 commented 1 year ago

doesn't work in Canada, (.info() that is)

cmjordan42 commented 1 year ago

Works on US East Coast

WARNING: No decryption keys could be extracted from JS file. Falling back to backup decrypt methods. then retrieves real-looking data from info

cmjordan42 commented 1 year ago

Hopefully Yahoo fired whichever product manager was spending developer time on this instead of actually improving the quality of their product, and then the keys were freed.

ValueRaider commented 1 year ago

@cmjordan42 That’d unfair. This encryption only affects webpage scraping. Direct GET requests work fine as yf.download and yahooquery does. I can understand why Yahoo wants to stop webpage scraping (expensive) in favour of GETs.

giantroadracer commented 1 year ago

doesn't work in Japan (.info) I tried Ver.0.2.10 and 0.2.11

cmjordan42 commented 1 year ago

Yahoo removed their API and created this problem for themselves, while also fragmenting programmatic users into the various scraping tools that now exist. The best way to avoid people scraping webpages is to provide an API to the underlying data, then they can throttle and control load to their heart's content.

So I think it's pretty fair to say that they are making a mistake in focusing their efforts on repeatedly obfuscating their client-side code in an attempt to mitigate a problem which they themselves created.

aliahmed2001 commented 1 year ago

I found that that version 2.11 doesn't work in jupyter notebooks but it will work with a warning in a regular .py file (.info() that is)

ValueRaider commented 1 year ago

@cmjordan42 If API gone then what is yahooquery using? Because that doesn't scrape.

cmjordan42 commented 1 year ago

@ValueRaider It does scrape - it's built around Selenium which is a scraper. And it's created and maintained by someone who had to reverse engineer Yahoo's internal APIs, not by Yahoo. When Yahoo does some internal reorganization to release a new version of their webpage and internal APIs to serve that webpage, YahooQuery is liable to break just as YFinance is.

When I say that the right thing to do is for Yahoo to provide an API, I mean an actual API where they publish and maintain an endpoint and/or language-specific libraries. Then consumers could rely on it and their servers would be putting their cycles towards serving up the financial data points as efficiently as possible, not marshalling them into JSONs or applying cryptography for the sole purpose of obfuscating that JSON. No web driver needed.

ntk42 commented 1 year ago

Just tested this (below). Canadian stock in Canada works. Which wasn't working prior to update., now at 0.2.11 `r = yf.Ticker('DFN.TO') print('r.fastinfo: ', r.fast_info)

print('r.info:', r.info')`

ValueRaider commented 1 year ago

@cmjordan42 "Selenium is only utilized to login to Yahoo, to retrieve data only accessible to premium subscribers." I've looked at how yahooquery works, it just sends GET requests to internal API. But I accept your broader point about officially supporting an API - maybe that's what these Yahoo changes are working towards, by the new owner Apollo Global.

@giantroadracer Thanks for report. We think key changes daily, and your report suggests Yahoo uses your local date to decide - that key I added worked 13-Feb and you're in 14-Feb. So 'key derivation service' needs to run in Far East or Australia.

JotaSe commented 1 year ago

I found that that version 2.11 doesn't work in jupyter notebooks but it will work with a warning in a regular .py file (.info() that is)

The same with 0.2.10, it doesn't work with notebooks

khalidcruz commented 1 year ago

It doesn't work anymore here in Sweden and it's still 13-Feb

Rogach commented 1 year ago

I looked at the obfuscation, it's done using the popular javascript-obfuscator tool, easily reversible with some manual effort.

Right now there is not much code in the unobfuscated version - four array keys are basically hardcoded:

var decryptionKey = ["key1", "key2", "key3", "key4"].reduce((a, b) => "" + a + App.main[b], "");

Each main.js version contains a hash in the filename (the format is "main.a0b1c2d3.modern.js" at the moment), so maybe it will make sense to make yahoo-keys.txt into a json dictionary:

{
  "main.a0b1c2d3.modern.js": "6ae2523aeafa283dad7...",
  "main.a1b2c3d4.modern.js": "3365117c2a368ffa5df...",
}

And if filename is not found in the dictionary then yfinance can throw an error instructing the user to report a new filename.

BTW, @khalidcruz, what's the full name of the main.js file you are seeing? (you can search for it in the page source, or in devtools, either filter by "main." in Network tab or look in the Sources tab in s.yimg.com/uc/finance/dd-site/js/ folder).

khalidcruz commented 1 year ago

@Rogach it's main.9c2e056368902a7b446e.modern

Rogach commented 1 year ago

@khalidcruz I forgot I also need the contents of App.main (from the page source) to actually extract the key :(

But here's a piece of code that you can run in devtools to extract the key corresponding to your main.js version:

["87b62ee5fe65", "08a3ee23291a", "25d6a4526abc", "e50551b7d7ab"].reduce((a, b) => "" + a + App.main[b], "")
doobery47 commented 1 year ago

Still getting the issue with version yfinance-0.2.11"No decryption keys could be extracted from JS file. Falling back to backup decrypt method". After all the conversation I'm a bit unclear if there is a work round or is still be worked on it

Rogach commented 1 year ago

@doobery47 We have bunch of workarounds, but they are unstable and so the work continues.

cmjordan42 commented 1 year ago

Still getting the issue with version yfinance-0.2.11"No decryption keys could be extracted from JS file. Falling back to backup decrypt method". After all the conversation I'm a bit unclear if there is a work round or is still be worked on it

Note that this warning message does not mean failure, just that it's falling back to the backup decrypt method which seems to work for a significant number of people right now. Check to see if you do get a value back after receiving that message.

Yazzito commented 1 year ago

Note that this warning message does not mean failure, just that it's falling back to the backup decrypt method which seems to work for a significant number of people right now. Check to see if you do get a value back after receiving that message.

For me, I'm seeing the warning you mentioned followed by:

line 162, in decrypt_cryptojs_aes_stores raise Exception("yfinance failed to decrypt Yahoo data response") Exception: yfinance failed to decrypt Yahoo data response

It worked for me this morning with no code changes. (I also saw the warning this morning, but no errors at that time.) PS. I am not rate limited because I only tested 2 ticker runs (both failed, same msg)

@Rogach I looked at your instructions in the post above. My main.js naming is also: main.9c2e056368902a7b446e.modern (same as @khalidcruz) When using the command in devtools I get: ["87b62ee5fe65", "08a3ee23291a", "25d6a4526abc", "e50551b7d7ab"].reduce((a, b) => "" + a + App.main[b], "")

'3c895fb5ddcc37d20d3073ed74ee3efad59bcb147c8e80fd279f83701b74b092d503dcd399604c6d8be8f3013429d3c2c76ed5b31b80c9df92d5eab6d3339fce'

I added this key to the yfinance yahoo_keys.txt locally but I'm still seeing the same decrypt error above. I'm not sure where you got the 4 key numbers in the reduce command from? So, not sure if I'm using the correct keys.

ValueRaider commented 1 year ago

@Yazzito yfinance doesn't read your local key file, it fetches from GitHub where I can update instantly without PIP. When I designed that I didn't expect key to change daily.