Closed robertluisw closed 1 year ago
Agreed. I noticed 12 hours ago Yahoo was more sensitive to spam, but only now a total block.
FYI I've just released 0.2.10 which fixes the backup decrypt methods but doesn't help (I hoped it would), so don't feel pressured to upgrade 0.2.9. Unless you want to debug and fix, then definitely upgrade.
If you came to report same issue, just upvote the top comment. Keep this thread clean and constructive.
Don't see any obvious change to dict
structure - still 10004 extra items just like before. Maybe they've upgraded their obfuscation from simply changing key to changing other encryption parameters.
This is the Javascript we think they use to encrypt: https://s.yimg.com/uc/finance/dd-site/js/main.e0c853d8cea2b75a5208.min.js
Reading compressed Javascript not my expertise, maybe someone can extract the encryption parameters and cross-check against yfinance/data.py::decrypt_cryptojs_aes_stores()
Why are the comments being deleted?
@jasmohan-narula Because all they essentially say is "I have issue too", contributing nothing. Thread quickly gets messy, some of us want to discuss problem.
Hello, I have investigated and I noticed different things during my testing using : python test_yfinance.py (Script tested in a python 3.11.2 docker)
_get_decryption_keys_from_yahoo_js(self, soup) always return an empty array of keys for me and I get the error : WARNING: No decryption keys could be extracted from JS file. Falling back to backup decrypt methods.
For function _get_decryption_keys_from_yahoo_js in data.py, line 218 :
if len(sub_keys) == key_count:
=> always return FALSE for me because key_count == 4 and len(sub_keys) always return 10004 for me, so the script never execute the code inside the if since last yahoo changes ?
So I tried to make this if work and I replace the instruction before :
sub_keys = key_list[ind+1:]
To :
sub_keys = key_list[len(key_list)-4:]
=> To really take the last 4 keys as explained in the comment of the first attempt
And the method now return the concatenate result of the last 4 keys :
# Gather decryption keys:
soup = BeautifulSoup(response.content, "html.parser")
keys = self._get_decryption_keys_from_yahoo_js(soup)
print(keys) => ['2ecbf885a68605aaf0ee8a8b9529fc80c6458ff25278cb981aa69b8103c18471c9219387b538643252eea3e8938c99b078e05ff7589994b974efc3fa8fcf505b']
I guess the code can now try to decrypt the store with the non-empty keys :
stores = decrypt_cryptojs_aes_stores(data, keys)
But I'm still getting the exception :
Exception: yfinance failed to decrypt Yahoo data response
When decrypt_cryptojs_aes_stores(data, keys) is called ....
It seems that the keys contained in the plugin object doesn't work anymore?
I hope it helped, I'll try go deeper in the code to see what makes the decryption failed.
I've created branch hotfix/decryption
for people to collab on. You'll still need to Pull Request but I'll merge with minimal review - proper review can happen later. Just make sure your fork is on that branch not main
.
Since the encryption method changed from Yahoo Finance's backend side, does this mean that all of yfinance package is not usable, not even the previous versions ?
@snowgato Off topic but it works. Just pip upgrade your requests and urllib3. https://github.com/dpguthrie/yahooquery/issues/143
The json loaded from root.App.main always comprises 10004 key/value pairs, but simply joining the last 4 values is no longer working.
The password needed to disentangle "stores" is generated by a javascript function supplied in "main.xxxxxxxxxxxxxxxxxx.modern.js". The version of this file is indicated by the hash "xxxxxxxxxxxxxxxxxx". The javascript code in this file changes with every version and seems to be heavily obfuscated. I got the same version of "main.xxxxxxxxxxxxxxxxxx.modern.js" for all pages I called on the same day, and another version on the next day. All pages delivered with a certain version of "main.xxxxxxxxxxxxxxxxxx.modern.js" are including the same 10004 key/value pairs in root.App.main, but the order of these 10004 key/values is changed with each page call.
I loaded a stock page in a webbrowser and then opened the inspection console (F12). After setting a breakpoint in "main.xxxxxxxxxxxxxxxxxx.modern.js" I could scrap the password from an internal variable. The password is still a concatenate of 4 of the values comprised in root.App.main and it is 128 bytes long. After manually copying the password into python code, I could read the "stores" dict.
The javascript code in "main.xxxxxxxxxxxxxxxxxx.modern.js" is obfuscated. Variable and function names seem to change in diffrenet versions. The decryption of the json string is done in this function call:
return s.context.dispatcher.stores=JSON.parse(function(e,t){return c().decrypt(e,t).toString(...
In this case, a variable named "e" is holding the entangled content of "stores" and a variable name "t" comprising the 128 bytes password. This password can be used to decrypt the "stores" in all pages delivered with that particular version of "main.xxxxxxxxxxxxxxxxxx.modern.js".
I have no idea, how to automate the generation of the password with "main.xxxxxxxxxxxxxxxxxx.modern.js". Maybe someone experienced in javascript will find a solution.
The way Yahoo is wrapping their data is by no means proper encryption. It is just a kind of obfuscation by misusing standard functions from cryptography.
ValueRaider - Treating human beings like filtered out list elements by deleting those attempts at being helpful is not a good long-term policy, even if it helps better focus on some technical issue at hand. What's needed is a way for you to add tags to certain posts that YOU consider most relevant, so that you and others can view the list of posts you consider most relevant to solving the technical issue(s) at hand. As right as you may be in deleting those posts, it is an infringement on free speech, human cognition and a total teamwork approach. GITHUB apparently needs a software modification to allow you the capability to tag and filter while still allowing people to contribute, without being deleted; except for rudeness, crassness, deliberate attempts at software sabotage, etc. deletions still being helpful or some sort of auto-rudeness filtering as allowable. 3-5 days now and still no clear-cut solution, maybe Yahoo is doing what it is doing purposefully, for a reason. There's always the SEC and direct access to its database, XBRL's, financial statements etc. Google Finance has closed its previously open doors to web scraping. There are other potential scraping alternatives, Zack's, MarketWatch, ForExFactory... Feel free to delete this post after reading it and giving pause for thought. And I'm not suggesting giving up on a technical solution to the current apparent encryption inability to access data with YahooFinance.
@Meborl @ValueRaider I've come to the conclusion that the only way to do this in a worthwhile manner is by executing the JS code itself. I'm looking at js2py and PyMiniRacer.
Of the two, my preference would be to find a solution using js2py as in this guide: https://devpress.csdn.net/python/630502f87e6682346619d3dc.html
PyMiniRacer has a lot more overhead and doesn't seem as stable.
There's just no point spending hours rewriting their smoke and mirrors logic in Python, only for them to change a few mirrors around and break it.
Did someone look into how steampipe is doing it before deleting my comment? Is that also using the "hidden" API (also mentioned in a deleted comment). How is that worse than what yfinance is doing?
@SymbReprUnlim This is not a platform for free speech. This is a platform for constructive collaboration, and that requires moderation. We don't need dozens of "I also have this error" replies - imo this is software sabotage. The few that want to contribute shouldn't have to sift through many useless comments.
3-5 days now and still no clear-cut solution
I missed the part where you paid for yfinance
and we are paid to fix this. Some people have already volunteered time and effort with very useful debugging, and now a solution appears visible.
@valankar Maybe I should have explained. The steampipe
example is less useful and harder to install than Python yahooquery
, already a great example of using the "hidden" API.
@JECSand Are you sure that js2py is securely sandboxed (cannot access any sensitive functionality on the host system)? Because otherwise we will be basically deploying RCE vulnerability to all the users of the library, which is quite suboptimal.
And if it is securely sandboxed, then we have a whole new can of worms - such execution environment will be trivially detectable. The code then will be able to do various shenanigans - from semi-harmless endless loops to randomizing the data if interpreter is detected.
Unfortunately executing JS is not a final solution either :(
@Rogach Good points. Given key doesn't change within a single day, maybe some volunteer can setup & run a separate service that regularly runs this JS to extract decryption key then post somewhere public e.g. a separate GitHub project? yfinance
already capable of fetching keys from GitHub HTML (the "backup decrypt" method), can easily redirect.
@Rogach Good points. Given key doesn't change within a single day, maybe some volunteer can setup & run a separate service that regularly runs this JS to extract decryption key then post somewhere public e.g. a separate GitHub project?
yfinance
already capable of fetching keys from GitHub HTML (the "backup decrypt" method), can easily redirect.
Just a clarification: Even if we find such a volunteer, would it imply yfinance users will need to 'pip update yfinance' every day (/few days) in order to have the updated keys?
Just a clarification: Even if we find such a volunteer, would it imply yfinance users will need to 'pip update yfinance' every day (/few days) in order to have the updated keys?
No, with one of the updates in the last couple of weeks a new way was introduced that he was referring to as backup decrypt. It is basically a textfile with keys. That file is already loaded through the regular yfinance code and can therefore be modified online without the need of updating yfinance.
@domsde Correct. Currently yfinance
can ping GitHub for new keys, but uploading new keys is manual process - not good when key changes daily. Just need one PIP update to change where yfinance
pings.
If you're looking for a place to execute potentially unsafe code: GitHub actions is a nice place for this. You can also directly store the output of your pipeline in Github again...
@ChristianKuehnel Thanks for info, I'll try to speak with @ranaroussi.
Seems the backup decrypt is working today. Anyone disagree? Because I'm curious if Yahoo uses different key for different regions.
@ChristianKuehnel Thanks for info, I'll try to speak with @ranaroussi.
Seems the backup decrypt is working today. Anyone disagree? Because I'm curious if Yahoo uses different key for different regions.
yes - now it works (backup decrypt). [middle-east (geography)].
Works for me here in Sweden.
Works in Germany.
So Europe good. America? East Asia?
Works in south America.
US looks good at 9:15EST (I had a problem before with "info" field. It's populated now)
@pchedas Please no more confirmations from Europe.
works for me here in India
doesn't work in Canada, (.info() that is)
Works on US East Coast
WARNING: No decryption keys could be extracted from JS file. Falling back to backup decrypt methods.
then retrieves real-looking data from info
Hopefully Yahoo fired whichever product manager was spending developer time on this instead of actually improving the quality of their product, and then the keys were freed.
@cmjordan42 That’d unfair. This encryption only affects webpage scraping. Direct GET requests work fine as yf.download
and yahooquery
does. I can understand why Yahoo wants to stop webpage scraping (expensive) in favour of GETs.
doesn't work in Japan (.info) I tried Ver.0.2.10 and 0.2.11
Yahoo removed their API and created this problem for themselves, while also fragmenting programmatic users into the various scraping tools that now exist. The best way to avoid people scraping webpages is to provide an API to the underlying data, then they can throttle and control load to their heart's content.
So I think it's pretty fair to say that they are making a mistake in focusing their efforts on repeatedly obfuscating their client-side code in an attempt to mitigate a problem which they themselves created.
I found that that version 2.11 doesn't work in jupyter notebooks but it will work with a warning in a regular .py file (.info() that is)
@cmjordan42 If API gone then what is yahooquery
using? Because that doesn't scrape.
@ValueRaider It does scrape - it's built around Selenium which is a scraper. And it's created and maintained by someone who had to reverse engineer Yahoo's internal APIs, not by Yahoo. When Yahoo does some internal reorganization to release a new version of their webpage and internal APIs to serve that webpage, YahooQuery is liable to break just as YFinance is.
When I say that the right thing to do is for Yahoo to provide an API, I mean an actual API where they publish and maintain an endpoint and/or language-specific libraries. Then consumers could rely on it and their servers would be putting their cycles towards serving up the financial data points as efficiently as possible, not marshalling them into JSONs or applying cryptography for the sole purpose of obfuscating that JSON. No web driver needed.
Just tested this (below). Canadian stock in Canada works. Which wasn't working prior to update., now at 0.2.11 `r = yf.Ticker('DFN.TO') print('r.fastinfo: ', r.fast_info)
@cmjordan42 "Selenium is only utilized to login to Yahoo, to retrieve data only accessible to premium subscribers." I've looked at how yahooquery
works, it just sends GET requests to internal API. But I accept your broader point about officially supporting an API - maybe that's what these Yahoo changes are working towards, by the new owner Apollo Global.
@giantroadracer Thanks for report. We think key changes daily, and your report suggests Yahoo uses your local date to decide - that key I added worked 13-Feb and you're in 14-Feb. So 'key derivation service' needs to run in Far East or Australia.
I found that that version 2.11 doesn't work in jupyter notebooks but it will work with a warning in a regular .py file (.info() that is)
The same with 0.2.10, it doesn't work with notebooks
It doesn't work anymore here in Sweden and it's still 13-Feb
I looked at the obfuscation, it's done using the popular javascript-obfuscator tool, easily reversible with some manual effort.
Right now there is not much code in the unobfuscated version - four array keys are basically hardcoded:
var decryptionKey = ["key1", "key2", "key3", "key4"].reduce((a, b) => "" + a + App.main[b], "");
Each main.js
version contains a hash in the filename (the format is "main.a0b1c2d3.modern.js" at the moment), so maybe it will make sense to make yahoo-keys.txt
into a json dictionary:
{
"main.a0b1c2d3.modern.js": "6ae2523aeafa283dad7...",
"main.a1b2c3d4.modern.js": "3365117c2a368ffa5df...",
}
And if filename is not found in the dictionary then yfinance can throw an error instructing the user to report a new filename.
BTW, @khalidcruz, what's the full name of the main.js
file you are seeing? (you can search for it in the page source, or in devtools, either filter by "main." in Network tab or look in the Sources tab in s.yimg.com/uc/finance/dd-site/js/ folder).
@Rogach it's main.9c2e056368902a7b446e.modern
@khalidcruz I forgot I also need the contents of App.main
(from the page source) to actually extract the key :(
But here's a piece of code that you can run in devtools to extract the key corresponding to your main.js version:
["87b62ee5fe65", "08a3ee23291a", "25d6a4526abc", "e50551b7d7ab"].reduce((a, b) => "" + a + App.main[b], "")
Still getting the issue with version yfinance-0.2.11"No decryption keys could be extracted from JS file. Falling back to backup decrypt method". After all the conversation I'm a bit unclear if there is a work round or is still be worked on it
@doobery47 We have bunch of workarounds, but they are unstable and so the work continues.
Still getting the issue with version yfinance-0.2.11"No decryption keys could be extracted from JS file. Falling back to backup decrypt method". After all the conversation I'm a bit unclear if there is a work round or is still be worked on it
Note that this warning message does not mean failure, just that it's falling back to the backup decrypt method which seems to work for a significant number of people right now. Check to see if you do get a value back after receiving that message.
Note that this warning message does not mean failure, just that it's falling back to the backup decrypt method which seems to work for a significant number of people right now. Check to see if you do get a value back after receiving that message.
For me, I'm seeing the warning you mentioned followed by:
line 162, in decrypt_cryptojs_aes_stores raise Exception("yfinance failed to decrypt Yahoo data response") Exception: yfinance failed to decrypt Yahoo data response
It worked for me this morning with no code changes. (I also saw the warning this morning, but no errors at that time.) PS. I am not rate limited because I only tested 2 ticker runs (both failed, same msg)
@Rogach I looked at your instructions in the post above. My main.js naming is also: main.9c2e056368902a7b446e.modern (same as @khalidcruz) When using the command in devtools I get: ["87b62ee5fe65", "08a3ee23291a", "25d6a4526abc", "e50551b7d7ab"].reduce((a, b) => "" + a + App.main[b], "")
'3c895fb5ddcc37d20d3073ed74ee3efad59bcb147c8e80fd279f83701b74b092d503dcd399604c6d8be8f3013429d3c2c76ed5b31b80c9df92d5eab6d3339fce'
I added this key to the yfinance yahoo_keys.txt locally but I'm still seeing the same decrypt error above. I'm not sure where you got the 4 key numbers in the reduce command from? So, not sure if I'm using the correct keys.
@Yazzito yfinance
doesn't read your local key file, it fetches from GitHub where I can update instantly without PIP. When I designed that I didn't expect key to change daily.
Looks like more encryption issues from yahoo.com
Exception: yfinance failed to decrypt Yahoo data response
[ Basically affects everything except price history @ValueRaider ]
Using Python version 3.11.0 yf version 0.2.9
@ValueRaider hijacking top post
[2023-06-23] Update! Latest release fixes financials tables (and removes decryption code).
What is happening? In December 2022 Yahoo began encrypting webpage data, maybe to block scraping. Now, Yahoo is regularly changing their encryption key, we think every day (and maybe multiple times a day). Without an automated system to extract key from their webpage (work in progress), fixing decryption requires a volunteer to manually extract the new key and provide to developers to upload to
yfinance
.~Help needed~
~Need a Javascript dev to write a script that extract AES decryption key from obfuscated JS that Yahoo uses to en/decrypt. The key is there plaintext, just need to automate extraction. The JS changes every day so limited scope to hardcode (use Git branch
hotfix/decryption
to print today's JS url). Don't worry about sandboxing etc, end users won't execute this.~~Script should be separate to
yfinance
codebase. I expect your only interaction withyfinance
is testing the extracted key works by putting inyfinance/data.py
~~Useful comments:~
Progress updates
2023-06-21
Update your yfinance! Latest release fixes financials tables and removes decryption code.
2023-06-04
Obvious that the decryption won't be fixed. See last message for plan.
2023-03-25
Ticker.info
fixed by fetching from API. Financials still broken.2023-02-17
Yahoo finally started using a new encryption key not in
yfinance
backup list of keys, so decryption failing. Inevitable. Surprised it took 4 days.2023-02-13
What is the "backup decryption method"? This is simply yfinance fetching decryption keys from this GitHub project website instead of extracting from Yahoo.com. Was broken in 0.2.9 but fixed in 0.2.10. Today worked for many thanks to a key uploaded yesterday. Discussion continues on a decent system for extracting & sharing decryption key.
workaround - yahooquery
Python module
yahooquery
is a functional alternative toyfinance
. Instead of scraping webpages it accesses Yahoo's undocumented API. Not encrypted and faster, but lacksearnings_dates
. GitHub Documentation