Closed iagomachadocs closed 1 year ago
It looks like the session timed out, but was this an isolated case or did all downloads fail after this one?
What was the progress bar output? 5.41M has been downloaded?
car.download_city_code('2607109', folder='PE', debug=True)
[25] - Invalid captcha 'pvhinu' to request city '2607109' in 'shapefile' format
[24] - Requesting city '2607109' in 'shapefile' format with captcha 'EWCPa'
[24] - Failed to download shapefile! When requesting city '2607109' in 'shapefile' format
[23] - Requesting city '2607109' in 'shapefile' format with captcha '6130F'
[23] - Failed to download shapefile! When requesting city '2607109' in 'shapefile' format
[22] - Invalid captcha 'xyAZ' to request city '2607109' in 'shapefile' format
[21] - Requesting city '2607109' in 'shapefile' format with captcha 'dfydT'
Downloading Shapefile for city with code '2607109': 100%|██████████| 5.41M/5.41M [00:02<00:00, 2.37MiB/s]
PosixPath('PE/SHAPE_2607109.zip')
After this error, some cities were downloaded normally but the same happened with 2 other files.
I no longer have the progress bar output, but I'll try to reproduce the error again and check this.
The same error occurred again with another file. It only downloaded 7.58kB of the HTML content for the city with code 2610806.
Perfect reproduction. It looks like we can try to mitigate it by retrying the download.
Comparing a normal request (HTML) and a Shapefile, we have a clear difference in the responses. The Shapefile and CSV responses include a different Content-Type
, and new fields Content-Transfer-Encoding
, Content-Transfer-Encoding
, Accept-Ranges
, etc.
Instead of total_size = int(response.headers.get("Content-Length", 0))
we can check if Content-Length
exists and is greater than zero, and check if there are other fields in the request-response too. Otherwise, throw FailedToDownloadShapefileException
or FailedToDownloadCsvException
and it will retry automatically.
What do you think, and would you like to implement it @iagomachadocs?
HTML
HTTP/1.1 200 OK
Server: nginx/1.12.2
Date: Mon, 03 Jul 2023 17:54:12 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Set-Cookie: PLAY_FLASH=; Max-Age=0; Expires=Mon, 03 Jul 2023 17:54:12 GMT; Path=/publico/
Set-Cookie: PLAY_ERRORS=; Max-Age=0; Expires=Mon, 03 Jul 2023 17:54:12 GMT; Path=/publico/
Set-Cookie: PLAY_SESSION=7d07242ac33ae6249acd11a24a0a30c67fedd411-___ID=e8ae6dc5-1ff8-4170-bf97-4b10af056e9f; Path=/publico/
Cache-Control: no-cache
Content-Security-Policy: upgrade-insecure-requests
Access-Control-Allow-Origin: *
Content-Security-Policy: upgrade-insecure-requests
Content-Encoding: gzip
CSV
HTTP/1.1 200 OK Server: nginx/1.12.2 Date: Mon, 03 Jul 2023 17:57:29 GMT Content-Type: text/csv; charset=utf-8 Content-Length: 193737 Connection: keep-alive Content-Transfer-Encoding: binary Content-Disposition: attachment; filename=1200252.csv Set-Cookie: PLAY_FLASH=; Max-Age=0; Expires=Mon, 03 Jul 2023 17:57:29 GMT; Path=/publico/ Set-Cookie: PLAY_ERRORS=; Max-Age=0; Expires=Mon, 03 Jul 2023 17:57:29 GMT; Path=/publico/ Set-Cookie: PLAY_SESSION=7d07242ac33ae6249acd11a24a0a30c67fedd411-___ID=e8ae6dc5-1ff8-4170-bf97-4b10af056e9f; Path=/publico/ Cache-Control: max-age=3600 Last-Modified: Thu, 06 Apr 2023 21:00:17 GMT ETag: "1680814817000--1269809650" Accept-Ranges: bytes Content-Security-Policy: upgrade-insecure-requests Access-Control-Allow-Origin: * Content-Security-Policy: upgrade-insecure-requests
Shapefile
HTTP/1.1 200 OK Server: nginx/1.12.2 Date: Mon, 03 Jul 2023 17 Content-Type: application/zip Content-Length: 20357265 Connection: keep-alive Content-Transfer-Encoding: binary Content-Disposition: attachment; filename=SHAPE_1200104.zip Set-Cookie: PLAY_FLASH=; Max-Age=0; Expires=Mon, 03 Jul 2023 17:54:34 GMT; Path=/publico/ Set-Cookie: PLAY_ERRORS=; Max-Age=0; Expires=Mon, 03 Jul 2023 17:54:34 GMT; Path=/publico/ Set-Cookie: PLAY_SESSION=7d07242ac33ae6249acd11a24a0a30c67fedd411-___ID=e8ae6dc5-1ff8-4170-bf97-4b10af056e9f; Path=/publico/ Cache-Control: max-age=3600 Last-Modified: Thu, 06 Apr 2023 20:58:09 GMT ETag: "1680814689000-44873666" Accept-Ranges: bytes Content-Security-Policy: upgrade-insecure-requests Access-Control-Allow-Origin: * Content-Security-Policy: upgrade-insecure-requests
Exactly what I was thinking. Yes, I would like to implement it.
I just opened PR #16 with this implementation
Sometimes, when attempting to download a shapefile, the HTTP request fails and an HTML file is returned instead. However, the script saves the HTML content as if it were the requested shapefile.
This issue occurred when I was trying to download data from the 'PE' state. The file SHAPE_2607109.zip was saved normally, but it contained the following content: