ipeaGIT / geobr

Easy access to official spatial data sets of Brazil in R and Python
https://ipeagit.github.io/geobr/
778 stars 116 forks source link

[Python] Fix read_comparable_areas years available presentation #351

Closed vss-2 closed 2 months ago

vss-2 commented 2 months ago

Bug: Python implementation of list_geobr() is missing commas. image

Cause: After solving #348 merging with #349, the field years available of read_comparable_areas() is not comma separated anymore. This is happening specifically on this function because Pandas implementation of read_html() is based on CSV (comma-separated-values) parsing, and given that read_comparables_areas() is the only function with no spaces after commas, Pandas treats like a single value.

vss-2 commented 2 months ago

Solution 1: I expected that using parse_data=True (see docs [here](https://pandas.pydata.org/pandas-docs/version/2.2/reference/api/pandas.read_html.html#:~:text=See%20read_csv()-,for%20more%20details.,-thousandsstr%2C%20optional)) parameter in read_html() would solve, it didn't because of mixing both formats "YYYY, YYYY" and "YYYY,YYYY". I would recommend to format this function on README.md as a possible solution. Solution 2: Using parameter thousands as '.' (the default is ',') solved the bug, but it doesn't seems good to rely the code on a coincidence: image