Closed mrhappyasthma closed 3 years ago
We can use xpath query to scrape.
//table[last()]//tr[last()-1]//td[2]
Some info on the yahoo finance API: https://observablehq.com/@stroked/yahoofinance
regularMarketPrice
and marketCap
from query1.finance.yahoo.com/v7/finance/quote?fields=regularMarketPrice,marketCap&symbols=
.
Also trailingAnnualDividendRate
and dividendDate
.
For company info and sec filings: sector
, website
, industry
, longBusinessSummary
, companyOfficers
https://query1.finance.yahoo.com/v10/finance/quoteSummary/MSFT?modules=assetProfile,secFilings
Other modules are:
modules = Array(26) [
0: "assetProfile"
1: "incomeStatementHistory"
2: "incomeStatementHistoryQuarterly"
3: "balanceSheetHistory"
4: "balanceSheetHistoryQuarterly"
5: "cashFlowStatementHistory"
6: "cashFlowStatementHistoryQuarterly"
7: "defaultKeyStatistics"
8: "financialData"
9: "calendarEvents"
10: "secFilings"
11: "recommendationTrend"
12: "upgradeDowngradeHistory"
13: "institutionOwnership"
14: "fundOwnership"
15: "majorDirectHolders"
16: "majorHoldersBreakdown"
17: "insiderTransactions"
18: "insiderHolders"
19: "netSharePurchaseActivity"
20: "earnings"
21: "earningsHistory"
22: "earningsTrend"
23: "industryTrend"
24: "indexTrend"
25: "sectorTrend"
]
Ex-dividend date comes from calendarEvents
.
Cash on hand comes from balanceSheetHistory
. - https://github.com/mrhappyasthma/IsThisStockGood/issues/22
The only thing I can't figure out how to get yet (which I need) is Next 5 Years (per annum)
. This is used as part of the calculations to determine pricing.
This comes down during the main response, so we can just URL fetch the analysis page.
The json is populated in the reactjs root.App.main=
.
I'm not entirely sure why, but doing a local test works fine. But porting the code to run on the server is not finding the string in the output.
import lxml.html as html
from json import loads
import re
import requests
def isPercentage(text):
match = re.match('(\d+(\.\d+)?%)', text)
return match != None
def parseNextPercentage(iterator):
node = None
while node is None or not isPercentage(node.text):
node = next(iterator)
return node.text
r = requests.get('https://finance.yahoo.com/quote/FB/analysis?p=FB')
tree = html.fromstring(bytes(r.text, encoding='utf8'))
tree_iterator = tree.iter()
for element in tree_iterator:
text = element.text
if text == 'Next 5 Years (per annum)':
print(parseNextPercentage(tree_iterator))
Oh, it was a copy paste error. Of course :P
Work mostly complete in https://github.com/mrhappyasthma/IsThisStockGood/commit/250485ac7923fb25a8614ea67ca77c48402f50d5.
As of https://github.com/mrhappyasthma/IsThisStockGood/commit/afa18449c1cf2e54e6b83a12b9ff778e17ed656d, the code is being used to calculate margin of safety.
I still need to parse the current price from the quote and display that.
Plenty of good data here too: https://stackoverflow.com/questions/44030983/yahoo-finance-url-not-working
Quote scraping added in https://github.com/mrhappyasthma/IsThisStockGood/commit/155a7660976295ee3159d2d4612d7fca7b9a4edd.
The only thing that's needed (although I don't have an immediate use for it) is for fetching quoteSummary modules: https://github.com/mrhappyasthma/IsThisStockGood/issues/3#issuecomment-762595891
Started the implementation here: https://github.com/mrhappyasthma/IsThisStockGood/commit/5bf4a85fbc048346464b202dd1362f1e2a0fc406
It seems like the parsing can be done along the lines of this:
results= data['quoteSummary']['result']
moduleData = {}
for module in self.modules:
for result in results:
if module in result:
moduleData[module] = result[module]
break
This should produce a dictionary with keys for each module, and the results being the result.
Reading data from a file, I confirmed this approach works:
import json
f = open("temp.txt", "r")
content = f.read()
data = json.loads(content)
results = data['quoteSummary']['result']
modules = ['assetProfile', 'secFilings', 'financialData']
moduleData = {}
for module in modules:
for result in results:
if module in result:
moduleData[module] = result[module]
break
for key, value in moduleData.items():
print(key)
Particularly useful for the analysis. URL =
https://finance.yahoo.com/quote/<symbol>/analysis
.Looking at the
Next 5 Years (per annum)
.