Problems Parsing Titles

Seeing extraction errors on certain websites that have titles.

File "/usr/local/lib/python2.7/site-packages/ContentAnalysis-0.1.1-py2.7.egg/ContentAnalysis/document.py", line 53, in parse ginfo = g.extract(url=self.link) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 56, in extract return self.crawl(cc) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 66, in crawl article = crawler.crawl(crawl_candiate) File "/usr/local/lib/python2.7/site-packages/goose/crawler.py", line 154, in crawl self.article.title = self.title_extractor.extract() File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 99, in extract return self.get_title() File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 78, in get_title return self.clean_title(title) File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 56, in clean_title if title_words[0] in TITLE_SPLITTERS: IndexError: list index out of range

You can replicate by running goose extract on a site like http://daydreamingfoodie.com/

grangier / python-goose

Problems Parsing Titles #262