urbanogilson / SICAR

This tool is designed for students, researchers, data scientists or anyone who would like to have access to SICAR files
https://urbanogilson.github.io/posts/sicar/
MIT License
71 stars 30 forks source link

Unidentified Image Error #14

Closed iagomachadocs closed 1 year ago

iagomachadocs commented 1 year ago

Hi Gilson! Thank you for this great package.

I'm having an issue with the download_state function. During the download of the shapefiles, some captcha image raises the PIL.UnidentifiedImageError and breaks the flow.

Ubuntu 20.04.6 LTS Python 3.11.4

Traceback (most recent call last):
  File "/(my_project_directory)/main.py", line 14, in <module>
    result = car.download_state(state='PE', folder='PE', debug=True, chunk_size=3072)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iago/.cache/pypoetry/virtualenvs/sicar-test-3qQYT9fP-py3.11/lib/python3.11/site-packages/SICAR/sicar.py", line 449, in download_state
    return self.download_cities(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/iago/.cache/pypoetry/virtualenvs/sicar-test-3qQYT9fP-py3.11/lib/python3.11/site-packages/SICAR/sicar.py", line 412, in download_cities
    result[(city, code)] = self.download_city_code(
                           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iago/.cache/pypoetry/virtualenvs/sicar-test-3qQYT9fP-py3.11/lib/python3.11/site-packages/SICAR/sicar.py", line 346, in download_city_code
    captcha = self._driver.get_captcha(self._download_captcha())
                                       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iago/.cache/pypoetry/virtualenvs/sicar-test-3qQYT9fP-py3.11/lib/python3.11/site-packages/SICAR/sicar.py", line 205, in _download_captcha
    return Image.open(io.BytesIO(response.content))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iago/.cache/pypoetry/virtualenvs/sicar-test-3qQYT9fP-py3.11/lib/python3.11/site-packages/PIL/Image.py", line 3298, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f82a81a3740>
github-actions[bot] commented 1 year ago

Welcome! Your issue will be analyzed as soon as possible. Hopefully, we can find a solution to the problem together, please try to provide as much information as possible to help us identify and fix the bug or improve the repository.

urbanogilson commented 1 year ago

The issue could not be reproduced using the script below. It is possible that the captcha image has been corrupted or is not a valid image file. To handle such cases and improve the resilience of the download process, I modified the code to catch the PIL.UnidentifiedImageError and automatically retry the download.

car = Sicar()
for i in range(1000):
    response = car._get(f"{car._CAPTCHA}?{urlencode({'id': int(random.random() * 1000000)})}", stream=True)
    captcha = Image.open(io.BytesIO(response.content))
urbanogilson commented 1 year ago
docker run -i -v $(pwd):/sicar urbanogilson/sicar:latest -<<EOF
from SICAR import Sicar
import pprint

car = Sicar(email="name@domain.com")

result = car.download_state(state='PE', folder='PE', chunk_size=3072)
pprint.pprint(result)
EOF
Downloading Shapefile for city with code '2600054': 100%|██████████| 1.09M/1.09M [00:01<00:00, 917kiB/s]
Downloading Shapefile for city with code '2600104': 100%|██████████| 4.19M/4.19M [00:01<00:00, 2.51MiB/s]
Downloading Shapefile for city with code '2600203': 100%|██████████| 11.4M/11.4M [00:16<00:00, 692kiB/s]
Downloading Shapefile for city with code '2600302': 100%|██████████| 1.17M/1.17M [00:01<00:00, 958kiB/s]
Downloading Shapefile for city with code '2600401': 100%|██████████| 10.2M/10.2M [00:02<00:00, 4.12MiB/s]
Downloading Shapefile for city with code '2600500': 100%|██████████| 7.85M/7.85M [00:03<00:00, 2.33MiB/s]
Downloading Shapefile for city with code '2600609': 100%|██████████| 1.78M/1.78M [00:01<00:00, 1.24MiB/s]
Downloading Shapefile for city with code '2600708': 100%|██████████| 8.26M/8.26M [00:04<00:00, 1.74MiB/s]
Downloading Shapefile for city with code '2600807': 100%|██████████| 6.46M/6.46M [00:03<00:00, 1.68MiB/s]
Downloading Shapefile for city with code '2600906': 100%|██████████| 8.37M/8.37M [00:02<00:00, 3.96MiB/s]
Downloading Shapefile for city with code '2601003': 100%|██████████| 1.36M/1.36M [00:00<00:00, 2.83MiB/s]
Downloading Shapefile for city with code '2601052': 100%|██████████| 3.59M/3.59M [00:02<00:00, 1.65MiB/s]
Downloading Shapefile for city with code '2601102': 100%|██████████| 8.76M/8.76M [00:02<00:00, 3.66MiB/s]
Downloading Shapefile for city with code '2601201': 100%|██████████| 2.14M/2.14M [00:02<00:00, 1.00MiB/s]
Downloading Shapefile for city with code '2601300': 100%|██████████| 1.54M/1.54M [00:01<00:00, 1.06MiB/s]
Downloading Shapefile for city with code '2601409': 100%|██████████| 8.31M/8.31M [00:08<00:00, 1.02MiB/s]
Downloading Shapefile for city with code '2601508': 100%|██████████| 3.44M/3.44M [00:02<00:00, 1.18MiB/s]
Downloading Shapefile for city with code '2601607': 100%|██████████| 8.42M/8.42M [00:14<00:00, 593kiB/s]
Downloading Shapefile for city with code '2601706': 100%|██████████| 5.24M/5.24M [00:02<00:00, 2.56MiB/s]
Downloading Shapefile for city with code '2601805': 100%|██████████| 8.87M/8.87M [00:02<00:00, 3.90MiB/s]
Downloading Shapefile for city with code '2601904': 100%|██████████| 2.47M/2.47M [00:03<00:00, 642kiB/s]
Downloading Shapefile for city with code '2602001': 100%|██████████| 4.62M/4.62M [00:01<00:00, 2.32MiB/s]
Downloading Shapefile for city with code '2602100': 100%|██████████| 6.88M/6.88M [00:02<00:00, 2.88MiB/s]
Downloading Shapefile for city with code '2602209': 100%|██████████| 2.27M/2.27M [00:12<00:00, 189kiB/s]
Downloading Shapefile for city with code '2602308': 100%|██████████| 5.78M/5.78M [00:02<00:00, 2.00MiB/s]
Downloading Shapefile for city with code '2602407': 7.58kiB [00:00, 13.1MiB/s]
Downloading Shapefile for city with code '2602506': 100%|██████████| 2.19M/2.19M [00:03<00:00, 598kiB/s]
Downloading Shapefile for city with code '2602605': 100%|██████████| 3.29M/3.29M [00:10<00:00, 306kiB/s]
Downloading Shapefile for city with code '2602704': 100%|██████████| 1.05M/1.05M [00:07<00:00, 147kiB/s]
Downloading Shapefile for city with code '2602803': 100%|██████████| 9.76M/9.76M [00:02<00:00, 4.04MiB/s]
Downloading Shapefile for city with code '2602902': 100%|██████████| 6.29M/6.29M [00:01<00:00, 3.22MiB/s]
Downloading Shapefile for city with code '2603009': 100%|██████████| 12.8M/12.8M [00:12<00:00, 990kiB/s]
Downloading Shapefile for city with code '2603108': 100%|██████████| 2.07M/2.07M [00:05<00:00, 392kiB/s]
Downloading Shapefile for city with code '2603207': 100%|██████████| 3.15M/3.15M [00:05<00:00, 573kiB/s]
Downloading Shapefile for city with code '2603306': 100%|██████████| 1.12M/1.12M [00:01<00:00, 918kiB/s]
Downloading Shapefile for city with code '2603405': 100%|██████████| 947k/947k [00:01<00:00, 558kiB/s]
Downloading Shapefile for city with code '2603454': 100%|██████████| 49.8k/49.8k [00:00<00:00, 216kiB/s]
Downloading Shapefile for city with code '2603504': 100%|██████████| 675k/675k [00:01<00:00, 408kiB/s]
Downloading Shapefile for city with code '2603603': 100%|██████████| 1.18M/1.18M [00:02<00:00, 489kiB/s]
Downloading Shapefile for city with code '2603702': 100%|██████████| 5.00M/5.00M [00:03<00:00, 1.28MiB/s]
Downloading Shapefile for city with code '2603801': 100%|██████████| 3.07M/3.07M [00:03<00:00, 782kiB/s]
Downloading Shapefile for city with code '2603900': 100%|██████████| 5.49M/5.49M [00:01<00:00, 2.78MiB/s]
Downloading Shapefile for city with code '2603926': 100%|██████████| 3.08M/3.08M [00:02<00:00, 1.16MiB/s]
Downloading Shapefile for city with code '2604007': 100%|██████████| 1.21M/1.21M [00:02<00:00, 502kiB/s]
Downloading Shapefile for city with code '2604106': 100%|██████████| 7.01M/7.01M [00:07<00:00, 917kiB/s]
Downloading Shapefile for city with code '2604155': 100%|██████████| 692k/692k [00:00<00:00, 717kiB/s]
Downloading Shapefile for city with code '2604205': 100%|██████████| 7.94M/7.94M [00:08<00:00, 943kiB/s]
Downloading Shapefile for city with code '2604304': 100%|██████████| 540k/540k [00:00<00:00, 758kiB/s]
Downloading Shapefile for city with code '2604403': 100%|██████████| 277k/277k [00:00<00:00, 390kiB/s]
Downloading Shapefile for city with code '2604502': 100%|██████████| 1.18M/1.18M [00:01<00:00, 626kiB/s]
Downloading Shapefile for city with code '2604601': 100%|██████████| 3.23M/3.23M [00:01<00:00, 1.85MiB/s]
Downloading Shapefile for city with code '2604700': 100%|██████████| 4.26M/4.26M [00:01<00:00, 2.28MiB/s]
Downloading Shapefile for city with code '2604809': 100%|██████████| 3.72M/3.72M [00:01<00:00, 3.55MiB/s]
Downloading Shapefile for city with code '2604908': 100%|██████████| 1.36M/1.36M [00:01<00:00, 924kiB/s]
Downloading Shapefile for city with code '2605004': 100%|██████████| 1.54M/1.54M [00:08<00:00, 191kiB/s]
Downloading Shapefile for city with code '2605103': 100%|██████████| 21.0M/21.0M [00:09<00:00, 2.22MiB/s]
Downloading Shapefile for city with code '2605152': 100%|██████████| 7.53M/7.53M [00:02<00:00, 3.20MiB/s]
Downloading Shapefile for city with code '2605202': 100%|██████████| 5.56M/5.56M [00:04<00:00, 1.13MiB/s]
Downloading Shapefile for city with code '2605301': 100%|██████████| 4.22M/4.22M [00:03<00:00, 1.12MiB/s]
Downloading Shapefile for city with code '2605400': 100%|██████████| 515k/515k [00:00<00:00, 524kiB/s]
Downloading Shapefile for city with code '2605509': 100%|██████████| 845k/845k [00:01<00:00, 817kiB/s]
Downloading Shapefile for city with code '2605608': 100%|██████████| 9.92M/9.92M [00:02<00:00, 3.83MiB/s]
Downloading Shapefile for city with code '2605707': 100%|██████████| 12.3M/12.3M [00:02<00:00, 4.73MiB/s]
Downloading Shapefile for city with code '2605806': 100%|██████████| 817k/817k [00:01<00:00, 416kiB/s]
Downloading Shapefile for city with code '2605905': 100%|██████████| 3.20M/3.20M [00:02<00:00, 1.47MiB/s]
Downloading Shapefile for city with code '2606002': 100%|██████████| 3.95M/3.95M [00:03<00:00, 1.29MiB/s]
Downloading Shapefile for city with code '2606101': 100%|██████████| 2.33M/2.33M [00:01<00:00, 1.59MiB/s]
Downloading Shapefile for city with code '2606200': 100%|██████████| 8.87M/8.87M [00:02<00:00, 4.09MiB/s]
Downloading Shapefile for city with code '2606309': 100%|██████████| 3.75M/3.75M [00:03<00:00, 1.25MiB/s]
Downloading Shapefile for city with code '2606408': 100%|██████████| 5.70M/5.70M [00:03<00:00, 1.62MiB/s]
Downloading Shapefile for city with code '2606507': 7.58kiB [00:00, 14.5MiB/s]
Downloading Shapefile for city with code '2606606': 100%|██████████| 12.2M/12.2M [00:04<00:00, 2.62MiB/s]
Downloading Shapefile for city with code '2606705': 100%|██████████| 1.16M/1.16M [00:01<00:00, 940kiB/s]
Downloading Shapefile for city with code '2606804': 100%|██████████| 3.18M/3.18M [00:01<00:00, 2.21MiB/s]
Downloading Shapefile for city with code '2606903': 100%|██████████| 18.7M/18.7M [00:04<00:00, 4.39MiB/s]
...