lease delete the output directory

KiwiTrue commented 2 years ago

[#] Crawler: error trying to retrieve this page: ch16.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition) From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch16.html [+] Please delete the output directory '/Users/mac/safaribooks/Books/Mike Meyers_ CompTIA A_ Guide to Managing and Troubleshooting PCs Fifth Edition (Exams 220-901 _ 220-902) (9781259589553)' and restart the program. [!] Aborting... i tried restarting and deleting 3 times , and the same error

Korred commented 2 years ago

@Wue9 - Thanks for creating the issue. I can confirm I have the same error on my end (on a different chapter though).

[-] Downloading book contents... (47 chapters)
[#] Crawler: error trying to retrieve this page: ch13.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition)
    From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
[+] Please delete the output directory 'G:\Repos\safaribooks\Books\Mike Meyers_ CompTIA A_ Guide to Managing and Troubleshooting PCs Fifth Edition (Exams 220-901 _ 220-902) (9781259589553)' and restart the program.
[!] Aborting...

After checking the info log it looks like this is a backend issue:

[21/Jan/2022 16:09:15] Crawler: error trying to retrieve this page: ch13.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition)
    From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
[21/Jan/2022 16:09:15] Last request done:
    URL: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
    DATA: None
    OTHERS: {}

    503
    Connection: keep-alive
    Content-Length: 449
    Server: Varnish
    Content-Type: text/html; charset=utf-8
    Accept-Ranges: bytes
    Date: Fri, 21 Jan 2022 15:09:16 GMT
    Via: 1.1 varnish
    X-Client-IP: 83.25.6.96
    X-Served-By: cache-hhn4020-HHN
    X-Cache: MISS
    X-Cache-Hits: 0
    X-Timer: S1642777756.144658,VS0,VE165
    Retry-After: 3600

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
  <head>
    <title>503 backend read error</title>
  </head>
  <body>
    <h1>Error 503 backend read error</h1>
    <p>backend read error</p>
    <h3>Guru Mediation:</h3>
    <p>Details: cache-hhn11543-HHN 1642777756 2976959339</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>

@lorenzodifuccia : I think it would be a good idea to enable automatic retries for failed requests. Especially when a file cannot be fetched on the first try due to a backend error.

lorenzodifuccia commented 2 years ago

@lorenzodifuccia : I think it would be a good idea to enable automatic retries for failed requests. Especially when a file cannot be fetched on the first try due to a backend error.

@Korred, we definitly will... Add this issue to milestone and labels

EntrixIII commented 2 years ago

I wanted to reproduce it and try to fix it, but I couldn't reproduce it... :( It took a considerable amount of time to download the 1746 image files, but in the end, it worked.

Korred commented 2 years ago

@EntrixIII - as mentioned before, the error was caused by a backend read error. That is not something we have control over. To fix this, you could mount a transport adapter (HTTPAdapter) on the requests Session and ensure that the max_retries parameter is set.

https://docs.python-requests.org/en/latest/user/advanced/#transport-adapters https://docs.python-requests.org/en/latest/api/#requests.adapters.HTTPAdapter

Side note: If you want to simulate a case where an asset is not reachable, you could probably kill your internet connection during file download, which should give you an 5xx error.

lorenzodifuccia / safaribooks

lease delete the output directory #305