menorki / python-calais

Automatically exported from code.google.com/p/python-calais
0 stars 0 forks source link

analyse_url barfs on URLs containing &pPK= #14

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Running analyse_url on a URL containing &pPK= will return an exception from the 
Calais service.

For example: http://www.libdems.org.uk/news_detail.aspx?
title=Government_smothering_dissent_on_Heathrow_says_Kramer__&pPK=acc86d46-11a7-
4688-bd4a-82eea4cf99f3

I believe this is because of a conflict with an external ID field.  I edited my 
analyse_url to the 
following and it is no longer an issue.

    def analyze_url(self, url):
        f = urllib.urlopen(url)
        html = self.preprocess_html(f.read())
        eid = urllib.quote(url)
        return self.analyze(html, content_type="TEXT/HTML", external_id=eid)

Original issue reported on code.google.com by olly.sm...@gmail.com on 10 Jan 2010 at 8:56