Open KiwiTrue opened 2 years ago
@Wue9 - Thanks for creating the issue. I can confirm I have the same error on my end (on a different chapter though).
[-] Downloading book contents... (47 chapters)
[#] Crawler: error trying to retrieve this page: ch13.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition)
From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
[+] Please delete the output directory 'G:\Repos\safaribooks\Books\Mike Meyers_ CompTIA A_ Guide to Managing and Troubleshooting PCs Fifth Edition (Exams 220-901 _ 220-902) (9781259589553)' and restart the program.
[!] Aborting...
After checking the info log it looks like this is a backend issue:
[21/Jan/2022 16:09:15] Crawler: error trying to retrieve this page: ch13.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition)
From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
[21/Jan/2022 16:09:15] Last request done:
URL: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch13.html
DATA: None
OTHERS: {}
503
Connection: keep-alive
Content-Length: 449
Server: Varnish
Content-Type: text/html; charset=utf-8
Accept-Ranges: bytes
Date: Fri, 21 Jan 2022 15:09:16 GMT
Via: 1.1 varnish
X-Client-IP: 83.25.6.96
X-Served-By: cache-hhn4020-HHN
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1642777756.144658,VS0,VE165
Retry-After: 3600
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>503 backend read error</title>
</head>
<body>
<h1>Error 503 backend read error</h1>
<p>backend read error</p>
<h3>Guru Mediation:</h3>
<p>Details: cache-hhn11543-HHN 1642777756 2976959339</p>
<hr>
<p>Varnish cache server</p>
</body>
</html>
@lorenzodifuccia : I think it would be a good idea to enable automatic retries for failed requests. Especially when a file cannot be fetched on the first try due to a backend error.
@lorenzodifuccia : I think it would be a good idea to enable automatic retries for failed requests. Especially when a file cannot be fetched on the first try due to a backend error.
@Korred, we definitly will... Add this issue to milestone and labels
I wanted to reproduce it and try to fix it, but I couldn't reproduce it... :( It took a considerable amount of time to download the 1746 image files, but in the end, it worked.
@EntrixIII - as mentioned before, the error was caused by a backend read error. That is not something we have control over. To fix this, you could mount a transport adapter (HTTPAdapter) on the requests Session and ensure that the max_retries parameter is set.
https://docs.python-requests.org/en/latest/user/advanced/#transport-adapters https://docs.python-requests.org/en/latest/api/#requests.adapters.HTTPAdapter
Side note: If you want to simulate a case where an asset is not reachable, you could probably kill your internet connection during file download, which should give you an 5xx error.
[#] Crawler: error trying to retrieve this page: ch16.html (CompTIA A+® Guide to Managing and Troubleshooting PCs, Fifth Edition) From: https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781259589553/files/ch16.html [+] Please delete the output directory '/Users/mac/safaribooks/Books/Mike Meyers_ CompTIA A_ Guide to Managing and Troubleshooting PCs Fifth Edition (Exams 220-901 _ 220-902) (9781259589553)' and restart the program. [!] Aborting...
i tried restarting and deleting 3 times , and the same error