Closed kelson42 closed 1 year ago
Actually, this is not the cause of the error but a side effect.
[MainThread::2022-09-20 05:40:51,479] INFO:>> Article:Pay-for-Plastic-Surgery
[MainThread::2022-09-20 05:41:21,872] ERROR:Interrupting process due to error: 503 Server Error: first byte timeout for url: https://www.wikihow.com/Pay-for-Plastic-Surgery
[MainThread::2022-09-20 05:41:21,873] ERROR:503 Server Error: first byte timeout for url: https://www.wikihow.com/Pay-for-Plastic-Surgery
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.1-py3.8.egg/wikihow2zim/scraper.py", line 991, in run
self.scrape_articles()
File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.1-py3.8.egg/wikihow2zim/scraper.py", line 513, in scrape_articles
if not self.scrape_article(article):
File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.1-py3.8.egg/wikihow2zim/scraper.py", line 623, in scrape_article
raise exc
File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.1-py3.8.egg/wikihow2zim/scraper.py", line 615, in scrape_article
soup, _ = get_soup(f"/{article}")
File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.1-py3.8.egg/wikihow2zim/utils.py", line 148, in get_soup
content, paths = fetch(path, **params)
File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.1-py3.8.egg/wikihow2zim/utils.py", line 77, in fetch
resp.raise_for_status()
File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: first byte timeout for url: https://www.wikihow.com/Pay-for-Plastic-Surgery
[MainThread::2022-09-20 05:41:21,876] DEBUG:shutting down executor IMG-T- with wait=False
[MainThread::2022-09-20 05:41:21,876] DEBUG:shutting down executor VID-T- with wait=False
[MainThread::2022-09-20 05:41:21,876] DEBUG:Removing /output/www.wikihow.com_dii94lnc
[MainThread::2022-09-20 05:41:21,961] DEBUG:Images 126417/126419
Exception ignored in: <function MagicDetect.__del__ at 0x7fe40facdb80>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/magic.py", line 308, in __del__
File "/usr/local/lib/python3.8/site-packages/magic.py", line 135, in close
TypeError: 'NoneType' object is not callable
503 errors happens from time to time, especially on long-lasting runs. Reopening #122 to at least add a pause+retry on 503 errors instead of just giving up.
Exception ignored in: <function MagicDetect.del at 0x7fe40facdb80> Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/magic.py", line 308, in del File "/usr/local/lib/python3.8/site-packages/magic.py", line 135, in close TypeError: 'NoneType' object is not callable
From https://farm.openzim.org/pipeline/7868c92ab725f3774a725236/debug