HTTP GET-Befehle sollten, wenn möglich, fehlertoleranter sein, so dass ein Fehler nicht unbedingt zum Abbruch des Scrapers führen muss.
Hier ein Beispiel für einen Abbruch:
Getting attachment '3750_2010_Anlage_4_4a_-_Karte2-11-alternativ'
Traceback (most recent call last):
File "main.py", line 150, in <module>
scraper.work_from_queue()
File "/home/ok/offeneskoeln2/scrape-a-ris/risscraper/scraper.py", line 79, in work_from_queue
self.get_submission(submission_id=job['key'])
File "/home/ok/offeneskoeln2/scrape-a-ris/risscraper/scraper.py", line 543, in get_submission
attachment = self.get_attachment_file(attachment, mform)
File "/home/ok/offeneskoeln2/scrape-a-ris/risscraper/scraper.py", line 569, in get_attachment_file
attachment.content = mform_response.read()
File "/usr/lib/python2.6/socket.py", line 348, in read
data = self._sock.recv(rbufsize)
File "/usr/lib/python2.6/httplib.py", line 542, in read
s = self.fp.read(amt)
File "/usr/lib/python2.6/socket.py", line 377, in read
data = self._sock.recv(left)
socket.error: [Errno 104] Connection reset by peer
In dem Fall handelt es sich um einen von mechanize ausgeführten Request.
Hier ein weiteres Beispiel an einer anderen Stelle:
Getting attachment '2013_2013_Anlage_2_Plan_'
Traceback (most recent call last):
File "main.py", line 150, in <module>
scraper.work_from_queue()
File "/home/ok/offeneskoeln2/scrape-a-ris/risscraper/scraper.py", line 79, in work_from_queue
self.get_submission(submission_id=job['key'])
File "/home/ok/offeneskoeln2/scrape-a-ris/risscraper/scraper.py", line 543, in get_submission
attachment = self.get_attachment_file(attachment, mform)
File "/home/ok/offeneskoeln2/scrape-a-ris/risscraper/scraper.py", line 565, in get_attachment_file
mform_response = mechanize.urlopen(mechanize_request)
File "/home/ok/offeneskoeln2/scrape-a-ris/venv/lib/python2.6/site-packages/mechanize/_opener.py", line 426, in urlopen
return _opener.open(url, data, timeout)
File "/home/ok/offeneskoeln2/scrape-a-ris/venv/lib/python2.6/site-packages/mechanize/_opener.py", line 193, in open
response = urlopen(self, req, data)
File "/home/ok/offeneskoeln2/scrape-a-ris/venv/lib/python2.6/site-packages/mechanize/_urllib2_fork.py", line 344, in _open
'_open', req)
File "/home/ok/offeneskoeln2/scrape-a-ris/venv/lib/python2.6/site-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain
result = func(*args)
File "/home/ok/offeneskoeln2/scrape-a-ris/venv/lib/python2.6/site-packages/mechanize/_urllib2_fork.py", line 1142, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/home/ok/offeneskoeln2/scrape-a-ris/venv/lib/python2.6/site-packages/mechanize/_urllib2_fork.py", line 1118, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 104] Connection reset by peer>
HTTP GET-Befehle sollten, wenn möglich, fehlertoleranter sein, so dass ein Fehler nicht unbedingt zum Abbruch des Scrapers führen muss.
Hier ein Beispiel für einen Abbruch:
In dem Fall handelt es sich um einen von mechanize ausgeführten Request.
Hier ein weiteres Beispiel an einer anderen Stelle: