openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
37 stars 2 forks source link

wikihow_en_maxi is failing #934

Open benoit74 opened 5 months ago

benoit74 commented 5 months ago

Recipe URL

https://farm.openzim.org/recipes/wikihow_en_maxi

Last log lines

[IMG-T-6::2024-03-21 15:03:35,062] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/f/f5/Start-a-Toyota-Prius-%28US%29-Step-3-Version-3.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-3-Version-3.jpg into ZIM::images/3139121941-v4-460px-Start-a-Toyota-Prius-(US)-Step-3-Version-3.webp
[IMG-T-0::2024-03-21 15:03:35,142] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/5/54/Start-a-Toyota-Prius-%28US%29-Step-4-Version-3.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-4-Version-3.jpg into ZIM::images/2405511860-v4-460px-Start-a-Toyota-Prius-(US)-Step-4-Version-3.webp
[IMG-T-5::2024-03-21 15:03:35,151] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/9/90/Start-a-Toyota-Prius-%28US%29-Step-5-Version-3.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-5-Version-3.jpg into ZIM::images/2441818810-v4-460px-Start-a-Toyota-Prius-(US)-Step-5-Version-3.webp
[IMG-T-2::2024-03-21 15:03:35,183] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/d/de/Start-a-Toyota-Prius-%28US%29-Step-2-Version-3.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-2-Version-3.jpg into ZIM::images/3455660863-v4-460px-Start-a-Toyota-Prius-(US)-Step-2-Version-3.webp
[IMG-T-3::2024-03-21 15:03:35,189] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/6/67/Start-a-Toyota-Prius-%28US%29-Step-6-Version-3.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-6-Version-3.jpg into ZIM::images/2454270653-v4-460px-Start-a-Toyota-Prius-(US)-Step-6-Version-3.webp
[IMG-T-9::2024-03-21 15:03:35,217] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/e/eb/Start-a-Toyota-Prius-%28US%29-Step-8-Version-2.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-8-Version-2.jpg into ZIM::images/3479384904-v4-460px-Start-a-Toyota-Prius-(US)-Step-8-Version-2.webp
[IMG-T-7::2024-03-21 15:03:35,223] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/e/e2/Start-a-Toyota-Prius-%28US%29-Step-1-Version-3.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-1-Version-3.jpg into ZIM::images/3090363148-v4-460px-Start-a-Toyota-Prius-(US)-Step-1-Version-3.webp
[IMG-T-1::2024-03-21 15:03:35,252] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/d/d2/Start-a-Toyota-Prius-%28US%29-Step-7-Version-2.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-7-Version-2.jpg into ZIM::images/3106222868-v4-460px-Start-a-Toyota-Prius-(US)-Step-7-Version-2.webp
[IMG-T-4::2024-03-21 15:03:35,277] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/b/b6/Start-a-Toyota-Prius-%28US%29-Step-10-Version-2.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-10-Version-2.jpg into ZIM::images/288633704-v4-460px-Start-a-Toyota-Prius-(US)-Step-10-Version-2.webp
[IMG-T-8::2024-03-21 15:03:35,284] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/f/f2/Start-a-Toyota-Prius-%28US%29-Step-9-Version-2.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-9-Version-2.jpg into ZIM::images/3147903772-v4-460px-Start-a-Toyota-Prius-(US)-Step-9-Version-2.webp
[IMG-T-9::2024-03-21 15:03:35,461] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/c/c1/Clean-Headlights-Step-22.jpg/-crop-127-140-127px-Clean-Headlights-Step-22.jpg into ZIM::images/3365283943--crop-127-140-127px-Clean-Headlights-Step-22.webp
[IMG-T-6::2024-03-21 15:03:35,567] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/d/df/Start-a-Toyota-Prius-%28US%29-Step-11-Version-2.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-11-Version-2.jpg into ZIM::images/683619230-v4-460px-Start-a-Toyota-Prius-(US)-Step-11-Version-2.webp
[IMG-T-2::2024-03-21 15:03:35,604] DEBUG:Attempting download of S3::https/www.wikihow.com/images/avatarOut/5/53/3565366.jpg?20160502110857 into ZIM::images/2537952873-3565366.webp
[IMG-T-5::2024-03-21 15:03:35,627] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/f/ff/Start-a-Toyota-Prius-%28US%29-Step-13-Version-2.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-13-Version-2.jpg into ZIM::images/725955494-v4-460px-Start-a-Toyota-Prius-(US)-Step-13-Version-2.webp
[IMG-T-0::2024-03-21 15:03:35,628] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/8/8b/Start-a-Toyota-Prius-%28US%29-Step-12-Version-2.jpg/v4-460px-Start-a-Toyota-Prius-%28US%29-Step-12-Version-2.jpg into ZIM::images/4284691268-v4-460px-Start-a-Toyota-Prius-(US)-Step-12-Version-2.webp
[IMG-T-7::2024-03-21 15:03:35,693] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/6/66/Get-the-Best-Gas-Mileage-from-Your-Toyota-Prius-Step-30.jpg/-crop-342-184-245px-Get-the-Best-Gas-Mileage-from-Your-Toyota-Prius-Step-30.jpg into ZIM::images/1645690422--crop-342-184-245px-Get-the-Best-Gas-Mileage-from-Your-Toyota-Prius-Step-30.webp
[IMG-T-3::2024-03-21 15:03:35,729] DEBUG:Attempting download of S3::https/www.wikihow.com/images/thumb/c/c1/Clean-Headlights-Step-22.jpg/-crop-342-184-245px-Clean-Headlights-Step-22.jpg into ZIM::images/3384551535--crop-342-184-245px-Clean-Headlights-Step-22.webp
[MainThread::2024-03-21 15:19:17,911] ERROR:Interrupting process due to error: ('Connection aborted.', TimeoutError(110, 'Connection timed out'))
[MainThread::2024-03-21 15:19:17,912] ERROR:('Connection aborted.', TimeoutError(110, 'Connection timed out'))
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 404, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 419, in connect
    self.sock = ssl_wrap_socket(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/local/lib/python3.8/ssl.py", line 1073, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.8/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 404, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 419, in connect
    self.sock = ssl_wrap_socket(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/local/lib/python3.8/ssl.py", line 1073, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.8/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', TimeoutError(110, 'Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.3-py3.8.egg/wikihow2zim/scraper.py", line 986, in run
    self.scrape_articles()
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.3-py3.8.egg/wikihow2zim/scraper.py", line 506, in scrape_articles
    if not self.scrape_article(article):
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.3-py3.8.egg/wikihow2zim/scraper.py", line 672, in scrape_article
    page_linked_styles=self.get_style_urls(soup),
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.3-py3.8.egg/wikihow2zim/scraper.py", line 88, in get_style_urls
    self.add_css(url)
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.3-py3.8.egg/wikihow2zim/scraper.py", line 226, in add_css
    source = url if inline else self.get_from_cache(url).decode("UTF-8")
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.3-py3.8.egg/wikihow2zim/scraper.py", line 206, in get_from_cache
    content = self.session.get(url).content
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 501, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', TimeoutError(110, 'Connection timed out'))
[MainThread::2024-03-21 15:19:17,923] DEBUG:shutting down executor IMG-T- with wait=False
[MainThread::2024-03-21 15:19:17,923] DEBUG:shutting down executor VID-T- with wait=False
[MainThread::2024-03-21 15:19:17,923] DEBUG:Removing /output/www.wikihow.com_416d5kjq


### How many times the recipe failed in a row?

Many

### How many ZIM have been produced before failure?

Zero

### Which action did you undertake so far?

I have disabled the recipe for now

### What's next?

This is an upstream scraper problem

### More details

_No response_
benoit74 commented 5 months ago

Edit: This is an upstream website issue