mrhappyasthma / IsThisStockGood

A tool for evaluating companies using the Rule #1 investing principles.
http://www.isthisstockgood.com
22 stars 10 forks source link

Add scraping for MSN Money #2

Closed mrhappyasthma closed 5 years ago

mrhappyasthma commented 5 years ago

This one is trickier. Haven't found a good way to format the URL. It seems to be:

https://www.msn.com/en-us/money/stockdetails/analysis/fi-126.1...

We'd also need to scrape the price ratios. Ideally this would be a csv download of some sort, but cant seem to find one.

mrhappyasthma commented 5 years ago

The abbreviations seem to map from:

NYSE -> NYS NASDAQ -> NAS AMEX -> ASE

mrhappyasthma commented 5 years ago

Actually an even better way seems to be:

https://www.msn.com/en-us/money/stockdetails/analysis?symbol=<symbol>

mrhappyasthma commented 5 years ago

Seems to work well using lxml to parse the html.

Example code:

from lxml import html
import urllib2

url = 'https://www.msn.com/en-us/money/stockdetails/analysis?symbol=goog'

response = urllib2.urlopen(url)

tree = html.fromstring(response.read())

def isfloat(value):
  if value is None:
    return False
  try:
    float(value)
    return True
  except ValueError:
    return False

def nextFloatFromIterator(iterator):
  node = None
  while node is None or not isfloat(node.text):
    node = next(iterator)
  return node.text

tree_iterator = tree.iter()
for element in tree_iterator:
  text = element.text
  if text == 'P/E Ratio 5-Year High':
    print '5 year high:'
    print nextFloatFromIterator(tree_iterator)
  if text == 'P/E Ratio 5-Year Low':
    print '5 year low:'
    print nextFloatFromIterator(tree_iterator)
mrhappyasthma commented 5 years ago

Done as of 84cd63734b9a98162a332790fd09a9e97e96271a