openzim / wikihow

WikiHow scraper
https://download.kiwix.org/zim/wikihow/
GNU General Public License v3.0
15 stars 2 forks source link

Too many redirects #147

Closed kelson42 closed 1 year ago

kelson42 commented 1 year ago

https://farm.openzim.org/pipeline/e277c53c8ba34a834171ef36/debug

Exception ignored in: <function MagicDetect.__del__ at 0x7f23f6c26040>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/magic.py", line 308, in __del__
  File "/usr/local/lib/python3.8/site-packages/magic.py", line 135, in close
TypeError: 'NoneType' object is not callable
rgaudin commented 1 year ago

Scraper gave up after 7 attempts, totaling more than 45mn on an article that would return a redirect loop. This looks like a server issue. Article seems to work fine ATM. I suggest relaunching the recipe.

[MainThread::2023-03-03 10:58:49,910] INFO:>> Article:Exercise-to-Lose-Belly-Fat
[MainThread::2023-03-03 10:58:53,378] WARNING:Backing off fetch(...) for 28.9s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2023-03-03 10:59:25,452] WARNING:Backing off fetch(...) for 49.7s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2023-03-03 11:00:18,264] WARNING:Backing off fetch(...) for 58.7s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2023-03-03 11:01:20,100] WARNING:Backing off fetch(...) for 203.0s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2023-03-03 11:04:46,264] WARNING:Backing off fetch(...) for 675.1s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2023-03-03 11:16:04,593] WARNING:Backing off fetch(...) for 1785.7s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2023-03-03 11:45:53,609] ERROR:Giving up fetch(...) after 7 tries (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2023-03-03 11:45:53,610] ERROR:Interrupting process due to error: Exceeded 30 redirects.
[MainThread::2023-03-03 11:45:53,610] ERROR:Exceeded 30 redirects.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/scraper.py", line 984, in run
    self.scrape_articles()
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/scraper.py", line 506, in scrape_articles
    if not self.scrape_article(article):
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/scraper.py", line 608, in scrape_article
    soup, _ = get_soup(f"/{article}")
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/utils.py", line 171, in get_soup
    content, paths = fetch(path, **params)
  File "/usr/local/lib/python3.8/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/utils.py", line 96, in fetch
    resp = Global.session.get(get_url(path, **params), params=params)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 723, in send
    history = [resp for resp in gen]
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 723, in <listcomp>
    history = [resp for resp in gen]
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 191, in resolve_redirects
    raise TooManyRedirects(
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.