Running analyse_url on a URL containing &pPK= will return an exception from the
Calais service.
For example: http://www.libdems.org.uk/news_detail.aspx?
title=Government_smothering_dissent_on_Heathrow_says_Kramer__&pPK=acc86d46-11a7-
4688-bd4a-82eea4cf99f3
I believe this is because of a conflict with an external ID field. I edited my
analyse_url to the
following and it is no longer an issue.
def analyze_url(self, url):
f = urllib.urlopen(url)
html = self.preprocess_html(f.read())
eid = urllib.quote(url)
return self.analyze(html, content_type="TEXT/HTML", external_id=eid)
Original issue reported on code.google.com by olly.sm...@gmail.com on 10 Jan 2010 at 8:56
Original issue reported on code.google.com by
olly.sm...@gmail.com
on 10 Jan 2010 at 8:56