ipeaGIT / geobr

Easy access to official spatial data sets of Brazil in R and Python
https://ipeagit.github.io/geobr/
794 stars 119 forks source link

Error with read_municipality() and read_state() #318

Closed victorferreirailos closed 7 months ago

victorferreirailos commented 1 year ago

I've been getting this warning:

C:\Users\XXXX\AppData\Roaming\Python\Python39\site-packages\geopandas\array.py:93: ShapelyDeprecationWarning:

len for multi-part geometries is deprecated and will be removed in Shapely 2.0. Check the length of the geoms property instead to get the number of parts of a multi-part geometry.

Followed by this error:

Exception: Some internal url is broken.Please report to https://github.com/ipeaGIT/geobr/issues

when trying to read_state or read_municipality with geobr. Is anyone else getting this? Any idea how to fix it?

vss-2 commented 1 year ago

Hello, I've tested here and the problem appears to be in the server-side. The server is unstable and is taking too long to return the dataset files.

I've debugged using the following code (a short version of what happens in utils.py):

import requests
import geobr
from io import StringIO
import pandas as pd
from time import time

urls = pd.read_csv(StringIO(geobr.utils.url_solver('http://www.ipea.gov.br/geobr/metadata/metadata_1.7.0_gpkg.csv').text)).query(f'geo == "municipality"')

def url_checker(url):
    try:
        get = requests.get(url)
        if get.status_code != 200:
            return(f'{url}: is not reachable')
    except requests.exceptions.RequestException as e:
        raise SystemExit(f'{url}: is not reachable \n Error: {e}')

for url in urls.download_path.values:
    print(url)
    start = time()
    url_checker(url)
    print(f'Took {time() - start} seconds')

Reaching the files is taking about 30 seconds. Sometimes it doesn't reaches anything, which is why you're getting the internal url broken error. We will have to wait until a admin checks the server or the system repairs itself. image

rafapereirabr commented 1 year ago

The data server at Ipea has been unstable for a few weeks. We hava had this problem before, so we have changed the code in geobr to redirect the data download to our data sets stored on Github whenever Ipea's server is offline or taking too long.

if you run

urls = pd.read_csv(StringIO(geobr.utils.url_solver('http://www.ipea.gov.br/geobr/metadata/metadata_1.7.0_gpkg.csv').text)).query(f'geo == "municipality"')

then you're only trying to reach Ipea's server and you don't get the benefit of the fact that we have mirrored the data on Github. I would not recommend doing this.

Are you still having this issue?

FG-SC commented 1 year ago

I've got the same error. Sometimes the code fails to run, other it works fine.

JoaoCarabetta commented 7 months ago

Was this related to server instabilities? Can I close it?

vss-2 commented 7 months ago

Yes, it can be closed, it was also fixed by #316