Add a scraper for Yahoo Finance

mrhappyasthma commented 5 years ago

Particularly useful for the analysis. URL = https://finance.yahoo.com/quote/<symbol>/analysis.

Looking at the Next 5 Years (per annum).

mrhappyasthma commented 4 years ago

We can use xpath query to scrape.

//table[last()]//tr[last()-1]//td[2]

https://finance.yahoo.com/quote/AMZN/analysis?p=AMZN

mrhappyasthma commented 3 years ago

Some info on the yahoo finance API: https://observablehq.com/@stroked/yahoofinance

mrhappyasthma commented 3 years ago

regularMarketPrice and marketCap from query1.finance.yahoo.com/v7/finance/quote?fields=regularMarketPrice,marketCap&symbols=.

Also trailingAnnualDividendRate and dividendDate.

For company info and sec filings: sector, website, industry, longBusinessSummary, companyOfficers

https://query1.finance.yahoo.com/v10/finance/quoteSummary/MSFT?modules=assetProfile,secFilings

Other modules are:

modules = Array(26) [
  0: "assetProfile"
  1: "incomeStatementHistory"
  2: "incomeStatementHistoryQuarterly"
  3: "balanceSheetHistory"
  4: "balanceSheetHistoryQuarterly"
  5: "cashFlowStatementHistory"
  6: "cashFlowStatementHistoryQuarterly"
  7: "defaultKeyStatistics"
  8: "financialData"
  9: "calendarEvents"
  10: "secFilings"
  11: "recommendationTrend"
  12: "upgradeDowngradeHistory"
  13: "institutionOwnership"
  14: "fundOwnership"
  15: "majorDirectHolders"
  16: "majorHoldersBreakdown"
  17: "insiderTransactions"
  18: "insiderHolders"
  19: "netSharePurchaseActivity"
  20: "earnings"
  21: "earningsHistory"
  22: "earningsTrend"
  23: "industryTrend"
  24: "indexTrend"
  25: "sectorTrend"
]

Ex-dividend date comes from calendarEvents.

Cash on hand comes from balanceSheetHistory. - https://github.com/mrhappyasthma/IsThisStockGood/issues/22

mrhappyasthma commented 3 years ago

The only thing I can't figure out how to get yet (which I need) is Next 5 Years (per annum). This is used as part of the calculations to determine pricing.

mrhappyasthma commented 3 years ago

This comes down during the main response, so we can just URL fetch the analysis page.

The json is populated in the reactjs root.App.main=.

https://stackoverflow.com/a/39635322/1366973

mrhappyasthma commented 3 years ago

I'm not entirely sure why, but doing a local test works fine. But porting the code to run on the server is not finding the string in the output.

import lxml.html as html
from json import loads
import re
import requests

def isPercentage(text):
  match = re.match('(\d+(\.\d+)?%)', text)
  return match != None

def parseNextPercentage(iterator):
  node = None
  while node is None or not isPercentage(node.text):
    node = next(iterator)
  return node.text

r = requests.get('https://finance.yahoo.com/quote/FB/analysis?p=FB')
tree = html.fromstring(bytes(r.text, encoding='utf8'))
tree_iterator = tree.iter()
for element in tree_iterator:
  text = element.text
  if text == 'Next 5 Years (per annum)':
    print(parseNextPercentage(tree_iterator))

mrhappyasthma commented 3 years ago

Oh, it was a copy paste error. Of course :P

mrhappyasthma commented 3 years ago

Work mostly complete in https://github.com/mrhappyasthma/IsThisStockGood/commit/250485ac7923fb25a8614ea67ca77c48402f50d5.

mrhappyasthma commented 3 years ago

As of https://github.com/mrhappyasthma/IsThisStockGood/commit/afa18449c1cf2e54e6b83a12b9ff778e17ed656d, the code is being used to calculate margin of safety.

I still need to parse the current price from the quote and display that.

mrhappyasthma commented 3 years ago

Plenty of good data here too: https://stackoverflow.com/questions/44030983/yahoo-finance-url-not-working

mrhappyasthma commented 3 years ago

Quote scraping added in https://github.com/mrhappyasthma/IsThisStockGood/commit/155a7660976295ee3159d2d4612d7fca7b9a4edd.

The only thing that's needed (although I don't have an immediate use for it) is for fetching quoteSummary modules: https://github.com/mrhappyasthma/IsThisStockGood/issues/3#issuecomment-762595891

mrhappyasthma commented 3 years ago

Started the implementation here: https://github.com/mrhappyasthma/IsThisStockGood/commit/5bf4a85fbc048346464b202dd1362f1e2a0fc406

mrhappyasthma commented 3 years ago

It seems like the parsing can be done along the lines of this:

results= data['quoteSummary']['result']
moduleData = {}
for module in self.modules:
  for result in results:
    if module in result:
      moduleData[module] = result[module]
      break

This should produce a dictionary with keys for each module, and the results being the result.

mrhappyasthma commented 3 years ago

Reading data from a file, I confirmed this approach works:

import json

f = open("temp.txt", "r")
content = f.read()
data = json.loads(content)

results = data['quoteSummary']['result']
modules = ['assetProfile', 'secFilings', 'financialData']
moduleData = {}
for module in modules:
  for result in results:
    if module in result:
      moduleData[module] = result[module]
      break

for key, value in moduleData.items():
  print(key)

mrhappyasthma / IsThisStockGood

Add a scraper for Yahoo Finance #3