haystack / feedme-rss

Automatically exported from code.google.com/p/feedme-rss
0 stars 0 forks source link

Handle unclean HTML more elegantly #96

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Recommendation currently fails if we can't parse the html.  Instead, we
should just use the post+feed title for recommendation and ignore the post
contents.

Fix this stack frame:

 File
"/var/virtualhost/sites/feedme/prod/server/../server/feedme/recommend.py",
line 18, in recommend
   return HttpResponse(get_recommendation_json(request), \

 File
"/var/virtualhost/sites/feedme/prod/server/../server/feedme/recommend.py",
line 58, in get_recommendation_json
   recommendations, sorted_friends = n_best_friends(post, sharer)

 File
"/var/virtualhost/sites/feedme/prod/server/../server/feedme/recommend.py",
line 199, in n_best_friends
   freq_dist_counts = post.tokenize()

 File
"/var/virtualhost/sites/feedme/prod/server/../server/feedme/models.py",
line 80, in tokenize
   nltk.clean_html(self.contents) + \

 File "/usr/lib/python2.5/site-packages/nltk/util.py", line 302, in clean_html
   cleaner.feed(html)

 File "/usr/lib/python2.5/HTMLParser.py", line 108, in feed
   self.goahead(0)

 File "/usr/lib/python2.5/HTMLParser.py", line 148, in goahead
   k = self.parse_starttag(i)

 File "/usr/lib/python2.5/HTMLParser.py", line 226, in parse_starttag
   endpos = self.check_for_whole_start_tag(i)

 File "/usr/lib/python2.5/HTMLParser.py", line 301, in
check_for_whole_start_tag
   self.error("malformed start tag")

 File "/usr/lib/python2.5/HTMLParser.py", line 115, in error
   raise HTMLParseError(message, self.getpos())

HTMLParseError: malformed start tag, at line 130, column 69

Original issue reported on code.google.com by marcua@gmail.com on 17 Aug 2009 at 2:59

GoogleCodeExporter commented 9 years ago
Fixed in R231-R232

Original comment by marcua@gmail.com on 17 Aug 2009 at 3:38

GoogleCodeExporter commented 9 years ago

Original comment by marcua@gmail.com on 17 Aug 2009 at 3:38