ipeaGIT / geobr

Easy access to official spatial data sets of Brazil in R and Python
https://ipeagit.github.io/geobr/
786 stars 118 forks source link

Error downloading spatial data for all municipalities within a specific state #298

Closed donascian closed 1 year ago

donascian commented 2 years ago

Hi,

There seems to be an issue when downloading the spatial data for all municipalities within a specific state. read_municipality(code_muni="all")seems to be working fine, with the issue only arising when a specific state is specified (using the relevant abbreviation). I have successfully reproduced the error in other virtual environments using the most recent version of geobr (0.1.10). Any help would be greatly appreciated!

Many thanks,

Ian Do Nascimento

----> 1 read_municipality(code_muni="MG")
[geobr/read_municipality.py] in read_municipality(code_muni, year, simplified, verbose)
     60 
     61     metadata = metadata[
---> 62         metadata[["code", "code_abrev"]].apply(
     63             lambda x: str(code_muni)[:2] in str(x["code"])
     64             or str(code_muni)[:2]  # if number e.g. 12

[pandas/core/frame.py](https://localhost:8080/#) in __getitem__(self, key)
   3462             if is_iterator(key):
   3463                 key = list(key)
-> 3464             indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
   3465 
   3466         # take() does not accept boolean indexers

[pandas/core/indexing.py](https://localhost:8080/#) in _get_listlike_indexer(self, key, axis)
   1312             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1313 
-> 1314         self._validate_read_indexer(keyarr, indexer, axis)
   1315 
   1316         if needs_i8_conversion(ax.dtype) or isinstance(

[pandas/core/indexing.py] in _validate_read_indexer(self, key, indexer, axis)
   1375 
   1376             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 1377             raise KeyError(f"{not_found} not in index")
   1378 
   1379 

KeyError: "['code_abrev'] not in index"
JoaoCarabetta commented 2 years ago

That is not the only function that crashed. Full error report:

FAILED tests/test_list_geobr.py::test_list_geobr - http.client.IncompleteRead: IncompleteRead(0 bytes read)
FAILED tests/test_read_amazon.py::test_read_amazon - fiona.errors.DriverError: 'temp.gpkg' not recognized as a supported file for...
FAILED tests/test_read_census_tract.py::test_read_census_tract - KeyError: "['code_abrev'] not in index"
FAILED tests/test_read_meso_region.py::test_read_meso_region - KeyError: "['code_abrev'] not in index"
FAILED tests/test_read_micro_region.py::test_read_micro_region - KeyError: "['code_abrev'] not in index"
FAILED tests/test_read_municipality.py::test_read_municipality - KeyError: "['code_abrev'] not in index"
FAILED tests/test_read_state.py::test_read_state - KeyError: 'code_abrev'
FAILED tests/test_read_weighting_area.py::test_read_weighting_area - KeyError: "['code_abrev'] not in index"
FAILED tests/test_utils.py::test_download_metadata - AssertionError: assert False

@rafapereirabr the new mirroring infrastructure changed the content of the files?

rafapereirabr commented 2 years ago

The code_abrev column has been updated (fixed) to code_abbrev in the new metadata file. The original metadata file is still working normally (I kept it so we wouldn't break the python package).

but the update of the python version to incorporate the new data sets (mirrored in two servers) need a couple tweaks. That's why I mentioned we (@JoaoCarabetta and I) should have a quick chat before making any changes to the python version