openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
38 stars 2 forks source link

wikihow_en_maxi is still failing #1002

Open benoit74 opened 8 months ago

benoit74 commented 8 months ago

Last execution has failed again: https://farm.openzim.org/pipeline/848d8ca0-0a2f-4e42-9b5b-94079666ed23/debug

Details:

[MainThread::2024-01-04 02:56:36,663] INFO:>> Article:Change-a-Monitor-Refresh-Rate-on-PC-or-Mac
[MainThread::2024-01-04 02:56:38,982] WARNING:Backing off fetch(...) for 25.2s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2024-01-04 02:57:06,160] WARNING:Backing off fetch(...) for 62.0s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2024-01-04 02:58:10,285] WARNING:Backing off fetch(...) for 93.5s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2024-01-04 02:59:45,887] WARNING:Backing off fetch(...) for 428.1s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2024-01-04 03:06:56,292] WARNING:Backing off fetch(...) for 895.5s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2024-01-04 03:21:54,168] WARNING:Backing off fetch(...) for 975.2s (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2024-01-04 03:38:11,538] ERROR:Giving up fetch(...) after 7 tries (requests.exceptions.TooManyRedirects: Exceeded 30 redirects.)
[MainThread::2024-01-04 03:38:11,538] ERROR:Interrupting process due to error: Exceeded 30 redirects.
[MainThread::2024-01-04 03:38:11,539] ERROR:Exceeded 30 redirects.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/scraper.py", line 984, in run
    self.scrape_articles()
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/scraper.py", line 506, in scrape_articles
    if not self.scrape_article(article):
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/scraper.py", line 608, in scrape_article
    soup, _ = get_soup(f"/{article}")
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/utils.py", line 171, in get_soup
    content, paths = fetch(path, **params)
  File "/usr/local/lib/python3.8/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/utils.py", line 96, in fetch
    resp = Global.session.get(get_url(path, **params), params=params)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 723, in send
    history = [resp for resp in gen]
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 723, in <listcomp>
    history = [resp for resp in gen]
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 191, in resolve_redirects
    raise TooManyRedirects(
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

While the article is now working: https://www.wikihow.tech/Change-a-Monitor-Refresh-Rate-on-PC-or-Mac

I don't know if it is a transient error on their side (the scraper ran for 12 days, so there are chances we encountered an upgrade or any other issue on their side) or a throttling (seems highly improbable given the fact that the scraper already ran for so long without issue).

I requested the recipe again (it will take some time to start since there are other wikihow tasks in the pipe, but it will be there once started: https://farm.openzim.org/pipeline/aa382dfb-808c-4f8d-856d-5cf642a0a1bb) and we should monitor it closely

benoit74 commented 7 months ago

It has been decided that zim-request must contain only new requests, so this indeed has to be transferred to wikihow repo

benoit74 commented 7 months ago

Recipe is failing again:

[MainThread::2024-01-30 09:30:31,968] DEBUG:-> article: Stop-Sabotaging-Yourself
[MainThread::2024-01-30 09:30:31,968] DEBUG:-> article: Stop-Wishing-to-Be-Like-or-Act-Like-Someone-Else
[MainThread::2024-01-30 09:30:31,968] DEBUG:-> article: Stop-Labeling-Yourself-a-Loser
[MainThread::2024-01-30 09:30:31,968] DEBUG:-> article: Stop-Underestimating-Yourself
[MainThread::2024-01-30 09:30:31,968] DEBUG:-> article: Teach-Self-Esteem
[MainThread::2024-01-30 09:30:31,968] DEBUG:-> article: Support-a-Partner-with-Low-Self-Esteem
[MainThread::2024-01-30 09:30:48,085] DEBUG:-> article: Unleash-the-Best-in-You
[MainThread::2024-01-30 09:30:48,085] DEBUG:-> article: Teach-Someone-to-Love-Themselves
[MainThread::2024-01-30 09:30:56,093] DEBUG:Category: Microsoft Office
[MainThread::2024-01-30 09:31:19,212] DEBUG:Removing /output/www.wikihow.com_w819xblo
[MainThread::2024-01-30 09:31:19,217] ERROR:FAILED. An error occurred: Object of type Response is not JSON serializable
[MainThread::2024-01-30 09:31:19,217] ERROR:Object of type Response is not JSON serializable
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/scraper.py", line 973, in run
    self.build_expected_articles()
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/scraper.py", line 358, in build_expected_articles
    for query in self.api_site.query(
  File "/usr/local/lib/python3.8/site-packages/pywikiapi/Site.py", line 310, in iterate
    result = self(action, **req)
  File "/usr/local/lib/python3.8/site-packages/pywikiapi/Site.py", line 130, in __call__
    response = self.request(method, **request_kw)
  File "/usr/local/lib/python3.8/site-packages/pywikiapi/Site.py", line 423, in request
    raise ApiError('Call failed', r)
pywikiapi.utils.ApiError: <unprintable ApiError object>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/entrypoint.py", line 225, in main
    sys.exit(scraper.run())
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.2-py3.8.egg/wikihow2zim/scraper.py", line 1007, in run
    logger.error(f"Interrupting process due to error: {exc}")
  File "/usr/local/lib/python3.8/site-packages/pywikiapi/utils.py", line 15, in __str__
    return self.message + ': ' + json.dumps(self.data)
  File "/usr/local/lib/python3.8/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Response is not JSON serializable

I will have to dive into the problem.

benoit74 commented 7 months ago

Most probably yet another throttling issue. We will get more details once openzim/wikihow#148 is fixed.

kelson42 commented 4 months ago

Still fails

benoit74 commented 4 months ago

Problem is https://github.com/openzim/wikihow/issues/163

rgaudin commented 4 months ago

Should this one be closed as duplicate then?

benoit74 commented 4 months ago

This one is in wrong repo, it should be in zim-requests, let's move it there