Closed WNiels closed 5 years ago
In my experience, it's due to insufficient memory. Could your tell me the size of current log file and your spare / total RAM.
Also, if ScrapydWeb and Scrapyd run on the same host, you can set up the SCRAPYD_LOGS_DIR
item to read local log file directly, which works only when your Scrapyd server is added as '127.0.0.1' in the config file of ScrapydWeb.
Note that parsing the log file with regular expression still may cause memory error due to insufficient memory.
https://github.com/my8100/scrapydweb/blob/master/scrapydweb/default_settings.py#L60
# Set to speed up loading scrapy logs.
# e.g., 'C:/Users/username/logs/' or '/home/username/logs/'
# The setting takes effect only when both ScrapydWeb and Scrapyd run on the same machine,
# and the Scrapyd server ip is added as '127.0.0.1'.
# Check out here to find out where the Scrapy logs are stored:
# https://scrapyd.readthedocs.io/en/stable/config.html#logs-dir
SCRAPYD_LOGS_DIR = ''
Thanks for the fast reply. I don't want to interrupt the crawling, but it should finish within a few days. Then i'll test the above and give an update.
Actually, you only need to reconfig and restart ScrapydWeb, without interrupting your crawling.
It's possible that you can't reproduce the problem after your crawling is finished, since there would be enough memory for ScrapydWeb to parse log. Or you can run another ScrapydWeb instance on other computer with enough memory, as a temporary solution.
Ok, there's the issue. 600MB Ram left and a 800MB log.
I have a crawl running, where after 87000 seconds since last refresh, the following error occures when trying to refresh:
Traceback (most recent call last): File "/root/anaconda3/envs/scrapyd_env/lib/python3.7/site-packages/flask/app.py", line 2292, in wsgi_app response = self.full_dispatch_request() File "/root/anaconda3/envs/scrapyd_env/lib/python3.7/site-packages/flask/app.py", line 1815, in full_dispatch_request rv = self.handle_user_exception(e) File "/root/anaconda3/envs/scrapyd_env/lib/python3.7/site-packages/flask/app.py", line 1718, in handle_user_exception reraise(exc_type, exc_value, tb) File "/root/anaconda3/envs/scrapyd_env/lib/python3.7/site-packages/flask/_compat.py", line 35, in reraise raise value File "/root/anaconda3/envs/scrapyd_env/lib/python3.7/site-packages/flask/app.py", line 1813, in full_dispatch_request rv = self.dispatch_request() File "/root/anaconda3/envs/scrapyd_env/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/root/anaconda3/envs/scrapyd_env/lib/python3.7/site-packages/flask/views.py", line 88, in view return self.dispatch_request(*args, **kwargs) File "/root/anaconda3/envs/scrapyd_env/lib/python3.7/site-packages/scrapydweb/directory/log.py", line 94, in dispatch_request self.request_scrapy_log() File "/root/anaconda3/envs/scrapyd_env/lib/python3.7/site-packages/scrapydweb/directory/log.py", line 142, in request_scrapy_log self.status_code, self.text = self.make_request(self.url, api=False, auth=self.AUTH) File "/root/anaconda3/envs/scrapyd_env/lib/python3.7/site-packages/scrapydweb/myview.py", line 191, in make_request front = r.text[:min(100, len(r.text))].replace('\n', '') File "/root/anaconda3/envs/scrapyd_env/lib/python3.7/site-packages/requests/models.py", line 861, in text content = str(self.content, encoding, errors='replace') MemoryError
The crawl seems to be running fine thow.