Closed GregoryMorse closed 4 years ago
Thanks for the post, very useful. One thing i have noted is that when using proxy, the json, is sometimes truncated and does not contain the information part: j['context'] etc. I guess maybe y! notices robot activity, or maybe it is due to poor quality of the pool of proxies i used... Best
This already fully implemented in #104
Your problem is probably not the proxy, Yahoo is browser header sensitive since it wants to make a compatible site. So when using requests.get try: my_headers = { 'User-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362' }
For yfinance, probably we should choose the most popular browser header for a very modern browser e.g. Chrome.
Great help, thanks, look like it solved the problem. Best
On Tue, 24 Sep 2019 at 11:27, Gregory Morse notifications@github.com wrote:
This already fully implemented in #104 https://github.com/ranaroussi/yfinance/pull/104
Your problem is probably not the proxy, Yahoo is browser header sensitive since it wants to make a compatible site. So when using requests.get try: my_headers = { 'User-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362' }
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ranaroussi/yfinance/issues/102?email_source=notifications&email_token=AB245CLM24DJDJ562ZRE4NTQLHMOPA5CNFSM4IYHRUSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7NW5YA#issuecomment-534474464, or mute the thread https://github.com/notifications/unsubscribe-auth/AB245CJULVXF25WGMBEJ5YTQLHMOPANCNFSM4IYHRUSA .
Hi, I saw your code proposal for this app, but until it is accepted, I am not sure how I can use it. So far i use the above code with loops (which doesn't look optimal...). Any chances that you share it here as a stand alone? All the Best
I do not see anything sub-optimal about the loops, if you see the JSON data structures returned, you will see a lot of enumeration is required to adapt it. Here is the standalone code:
my_headers = { 'User-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362' }
#demjson.decode() and jsonnet.evaluate_snippet() can parse raw javascript string
def get_financial_translator(vendor):
"""
#vendor = 'https://s.yimg.com/uc/finance/dd-site/js/vendor.d859a2b02e2b0845735f.min.js'
req = requests.get(url=vendor)
r = req.text
res = re.search('t\.p\+\"\"\+\((.*?)\[e\]\|\|e\)\+\"\.\"\+(.*?)\[e\]\+\"\.min\.js\";', r)
ks = json.loads(re.sub(r'([\{\s,])(\w+)(:)', r'\1"\2"\3', res[1]))
vs = json.loads(re.sub(r'([\{\s,])(\w+)(:)', r'\1"\2"\3', res[2]))
fncneUrl = vs[list(ks.keys())[list(ks.values()).index('Quote.financials')]]
url = 'https://s.yimg.com/uc/finance/dd-site/js/Quote.financials.' + fncneUrl + '.min.js'
req = requests.get(url=url)
strs = req.text.split('e.exports=')
objs = [json.loads(re.sub(':!0', ':true', re.sub(r'([\{\s,])(\w+)(:)', r'\1"\2"\3', re.sub('}(}\);|,\d+:function\(e,t\){)', '', strs[n])))) for n in range(6, 9)]
ft = {n['item']:[[x['items'] if 'items' in x else '',x['title'],x['isDate'] if 'isDate' in x else False] for x in n['config']] for n in objs}
print(ft)
"""
#valid financial translator as of 2019/9/24
ft = {'incomeStatement': [['endDate', 'REVENUE', True], ['totalRevenue', 'TOTAL_REVENUE', False], ['costOfRevenue', 'COST_OF_REVENUE', False], ['grossProfit', 'GROSS_PROFIT', False], ['', 'OPERATING_EXPENSES', False], ['researchDevelopment', 'RESEARCH_DEVELOPMENT', False], ['sellingGeneralAdministrative', 'SELLING_GEN_ADMIN', False], ['nonRecurring', 'NON_RECURRING', False], ['otherOperatingExpenses', 'OTHERS', False], ['totalOperatingExpenses', 'TOTAL_OPERATING_EX', False], ['operatingIncome', 'OPERATING_INCOME_LOSS', False], ['', 'INCOME_FROM_CONTINUING_OPS', False], ['totalOtherIncomeExpenseNet', 'TOTAL_OTHER_INCOME_EXPENSES_NET', False], ['ebit', 'EARNINGS_BEFORE_INTEREST_TAX', False], ['interestExpense', 'INTEREST_EXPENSE', False], ['incomeBeforeTax', 'INCOME_BEFORE_TAX', False], ['incomeTaxExpense', 'INCOME_TAX_EXPENSE', False], ['minorityInterest', 'MINORITY_INTEREST', False], ['netIncomeFromContinuingOps', 'NET_INCOME_FROM_CONTINUING_OPS', False], ['', 'NON_RECURRING_EVENTS', False], ['discontinuedOperations', 'DISCONTINUED_OPS', False], ['extraordinaryItems', 'EXTRAORDINARY_ITEMS', False], ['effectOfAccountingCharges', 'EFFECT_OF_ACCOUNTING_CHANGES', False], ['otherItems', 'OTHER_ITEMS', False], ['', 'NET_INCOME_TITLE', False], ['netIncome', 'NET_INCOME', False], ['preferredStock', 'PREFERRED_STOCK_OTHER_ADJ', False], ['netIncomeApplicableToCommonShares', 'NET_INCOME_APPLICABLE_TO_COMMON_SHARES', False]], 'balanceSheet': [['endDate', 'PERIOD_ENDING', True], ['', 'CURRENT_ASSETS', False], ['cash', 'CASH_AND_CASH_EQUIVALENTS', False], ['shortTermInvestments', 'SHORT_TERM_INVESTMENTS', False], ['netReceivables', 'NET_RECEIVABLES', False], ['inventory', 'INVENTORY', False], ['otherCurrentAssets', 'OTHER_CURRENT_ASSETS', False], ['totalCurrentAssets', 'TOTAL_CURRENT_ASSETS', False], ['longTermInvestments', 'LONG_TERM_INVESTMENTS', False], ['propertyPlantEquipment', 'PROPERTY_PLANT_AND_EQUIPMENT', False], ['goodWill', 'GOODWILL', False], ['intangibleAssets', 'INTANGIBLE_ASSETS', False], ['accumulatedAmortization', 'ACCUMULATED_AMORTIZATION', False], ['otherAssets', 'OTHER_ASSETS', False], ['deferredLongTermAssetCharges', 'DEFERRED_LONG_TERM_ASSET_CHARGES', False], ['totalAssets', 'TOTAL_ASSETS', False], ['', 'CURRENT_LIABILITIES', False], ['accountsPayable', 'ACCOUNTS_PAYABLE', False], ['shortLongTermDebt', 'SHORT_CURRENT_LONG_TERM_DEBT', False], ['otherCurrentLiab', 'OTHER_CURRENT_LIABILITIES', False], ['totalCurrentLiabilities', 'TOTAL_CURRENT_LIABILITIES', False], ['longTermDebt', 'LONG_TERM_DEBT', False], ['otherLiab', 'OTHER_LIABILITIES', False], ['deferredLongTermLiab', 'DEFERRED_LONG_TERM_LIABILITY_CHARGES', False], ['minorityInterest', 'MINORITY_INTEREST', False], ['negativeGoodWill', 'NEGATIVE_GOODWILL', False], ['totalLiab', 'TOTAL_LIABILITIES', False], ['', 'STOCKHOLDERS_EQUITY', False], ['stockOptionWarrants', 'MISC_STOCKS_OPTIONS_WARRANTS', False], ['redeemablePreferredStock', 'REDEEMABLE_PREFERRED_STOCK', False], ['redeemablePreferredStock', 'PREFERRED_STOCK', False], ['commonStock', 'COMMON_STOCK', False], ['retainedEarnings', 'RETAINED_EARNINGS', False], ['treasuryStock', 'TREASURY_STOCK', False], ['capitalSurplus', 'CAPITAL_SURPLUS', False], ['otherStockholderEquity', 'OTHER_STOCKHOLDER_EQUITY', False], ['totalStockholderEquity', 'TOTAL_STOCKHOLDER_EQUITY', False], ['netTangibleAssets', 'NET_TANGIBLE_ASSETS', False]], 'cashflowStatement': [['endDate', 'PERIOD_ENDING', True], ['netIncome', 'NET_INCOME', False], ['', 'OPERATING_ACTIVITIES_CASHFLOWS_PROVIDED', False], ['depreciation', 'DEPRECIATION', False], ['changeToNetincome', 'ADJUSTMENT_TO_NET_INCOME', False], ['changeToAccountReceivables', 'CHANGES_IN_ACCOUNTS_RECEIVABLES', False], ['changeToLiabilities', 'CHANGES_IN_LIABILITIES', False], ['changeToInventory', 'CHANGES_IN_INVENTORIES', False], ['changeToOperatingActivities', 'CHANGES_IN_OTHER_OPERATING_ACT', False], ['totalCashFromOperatingActivities', 'TOTAL_CASH_FLOW_FROM_OP_ACT', False], ['', 'INVESTING_ACTIVITIES_CASHFLOWS_PROVIDED', False], ['capitalExpenditures', 'CAPITAL_EX', False], ['investments', 'INVESTMENTS', False], ['otherCashflowsFromInvestingActivities', 'OTHER_CASHFLOWS_FROM_INVESTING_ACT', False], ['totalCashflowsFromInvestingActivities', 'TOTAL_CASH_FLOW_FROM_INVEST_ACT', False], ['', 'FINANCING_ACTIVITIES_CASHFLOWS_PROVIDED', False], ['dividendsPaid', 'DIVIDENDS_PAID', False], ['salePurchaseOfStock', 'SALE_PURCHASE_OF_STOCK', False], ['netBorrowings', 'NET_BORROWINGS', False], ['otherCashflowsFromFinancingActivities', 'OTHER_CASHFLOWS_FROM_FINANCING_ACT', False], ['totalCashFromFinancingActivities', 'TOTAL_CASH_FLOW_FROM_FIN_ACT', False], ['effectOfExchangeRate', 'EFFECT_OF_EXCHANGE_RATE_CHANGES', False], ['changeInCash', 'CHANGE_IN_CASH_AND_EQ', False]]}
return ([('incomeStatementHistory','incomeStatementHistory','incomeStatement'),
('cashflowStatementHistory','cashflowStatements','cashflowStatement'),
('balanceSheetHistory','balanceSheetStatements','balanceSheet'),
('incomeStatementHistoryQuarterly','incomeStatementHistory','incomeStatement'),
('cashflowStatementHistoryQuarterly','cashflowStatements','cashflowStatement'),
('balanceSheetHistoryQuarterly','balanceSheetStatements','balanceSheet')],
ft)
#url = '%s/%s/%s' % (scrape_url, StockName, 'sustainability')
#q['esgScores'].keys()
#https://jsonformatter.org/
def get_quarterly_financials(StockName, tz=None):
scrape_url = 'https://finance.yahoo.com/quote'
url = '%s/%s/%s' % (scrape_url, StockName, 'financials')
req = requests.get(url=url, headers = my_headers)
parseTag = 'root.App.main = '
r = req.text
idx = r.find(parseTag)
j = json.loads(r[idx:].split('\n')[0][len(parseTag):][:-1])
#dct = j['context']['dispatcher']['stores']['StreamDataStore']['quoteData'][StockName] #contains Ticker.info but less complete and with a UUID
#inf = {x:(dct[x]['raw'] if type(dct[x]) is dict else dct[x]) for x in dct}
q = j['context']['dispatcher']['stores']['QuoteSummaryStore']
#{x:((q['price'][x]['raw'] if 'raw' in q['price'][x] else '') if type(q['price'][x]) is dict else q['price'][x]) for x in q['price']}
(fncls, ft) = get_financial_translator(re.search('https://s\.yimg\.com/uc/finance/dd-site/js/vendor\..*?\.min\.js', req.text)[0])
strings = j['context']['dispatcher']['stores']['LangStore']['baseLangs']['td-app-finance']
dfs = []
for (nm, sbnm, knm) in fncls:
if not nm in q or not sbnm in q[nm]: df = pd.DataFrame()
else:
df = [[strings[x[1]] if not x[2] else ''] + [((q[nm][sbnm][n][x[0]]['raw'] if not x[2] else q[nm][sbnm][n][x[0]]['raw']) if 'raw' in q[nm][sbnm][n][x[0]] else '') if x[0] in q[nm][sbnm][n] else '-' for n in range(len(q[nm][sbnm]))] for x in ft[knm]]
df = pd.DataFrame(df[1:], None, df[0])
df.set_index('', inplace=True)
df.columns = pd.to_datetime(df.columns, unit="s")
if not tz is None: df.columns = df.columns.tz_localize(tz)
df = df.where((df != '-') & (df != '')).astype(float)
dfs.append(df)
return dfs
The app may want to stop using and relying on Pandas tables for the financials and info as all of these can be fetched with a single call with a serious amount of additional information.
I provide the very straight-forward and simple parsing code to achieve this effect:
The rest of the puzzle to parse it into the table data (so you get the correct ordering and correct string captions) can be found in this javascript file: https://s.yimg.com/uc/finance/dd-site/js/Quote.financials.938c6b86ad7b69ba7927.min.js