jjlee / mechanize

Stateful programmatic web browsing in Python, after Andy Lester's Perl module WWW::Mechanize .
http://wwwsearch.sourceforge.net/mechanize/
618 stars 121 forks source link

ParseError when accessing forms on .aspx page #63

Open annapowellsmith opened 12 years ago

annapowellsmith commented 12 years ago

On scraping one particular .aspx page, mechanize consistently reports 'ParseError: unexpected '[' char in declaration' when accessing forms. Code in full:

url = 'http://corporate.marksandspencer.com/aboutus/where/international_stores'
browser = mechanize.Browser()
browser.open(url)
browser.select_form(nr=0)

I have tried manually replacing the DTD, but it doesn't help:

url = 'http://corporate.marksandspencer.com/aboutus/where/international_stores'
browser = mechanize.Browser()
browser.open(url)
html = browser.response().get_data().replace('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">','').replace('<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">','<html>')
response = mechanize.make_response(html, [("Content-Type", "text/html")], INTERNATIONAL_URL, 200, "OK")
browser.set_response(response)
browser.select_form(nr=0)