jjlee / mechanize

Stateful programmatic web browsing in Python, after Andy Lester's Perl module WWW::Mechanize .
http://wwwsearch.sourceforge.net/mechanize/
618 stars 123 forks source link

socket.error: [Errno 54] Connection reset by peer on very specific case #92

Open jorgecarleitao opened 10 years ago

jorgecarleitao commented 10 years ago

Using Python 2.7.6 and mechanize 0.2.5 in a virtualenv, try the following: import mechanize br = mechanize.Browser() br.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) ' 'AppleWebKit/537.36 (KHTML, like Gecko)')] response = br.open("http://www.parlamento.pt/DeputadoGP/Paginas/Biografia.aspx?BID=2665") print response.code data = response.read()

on my machine (OS X 10.9.1), this is causing

 response.read()
 File ".../lib/python2.7/site-packages/mechanize/_response.py", line 190, in read
    self.__cache.write(self.wrapped.read())
 File ".../lib/python2.7/socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
 File ".../lib/python2.7/httplib.py", line 567, in read
    s = self.fp.read(amt)
 File ".../lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
 socket.error: [Errno 54] Connection reset by peer

This only happens in this particular url (i.e. ?BID=2665), and with those headers. Either other urls with same headers or no headers with same url cause no problem.

In my particular case I just removed headers, but it makes sense to keep this documented somewhere.