jtleider / censusdata

Download data from Census API
MIT License
139 stars 29 forks source link

Indexing error #27

Closed 24thronin closed 3 years ago

24thronin commented 3 years ago

Hello, First off, I love this package and the excellent documentation. I am running into a small error when I try to download data from the 2017 acs5 where the index column is entirely populated with null entries. When I reset_index the null entries are converted to the string "Display Error!". I am not sure if the null index data is a problem from the package or from the Census website but I wanted to reach out and ask. The problem only occurs for 2017 acs5 not the 2016 or 2018 years.

Sample Code: temp17 = censusdata.download(src='acs5', year=2017, geo=censusdata.censusgeo([('zip code tabulation area', '')]), var=['B01001_001E'], key=api_key).reset_index() temp18 = censusdata.download(src='acs5', year=2018, geo=censusdata.censusgeo([('zip code tabulation area', '')]), var=['B01001_001E'], key=api_key).reset_index()

In this example the index column of temp18 is fine but temp17 is all null. All the best.

24thronin commented 3 years ago

Hi again, I needed this turned around quickly so I found a solution that works for me by stepping through your code. It appears that the one table I mentioned above includes a state column but it does not include any state data. State data is included for all other years except this one. When your code downloads the data it creates a state field full of None values. Those values are then used to create the geoindex with the line: geoindex = [censusgeo([(key, geodata[key][i]) for key in geodata if key != 'NAME'], geodata['NAME'][i]) for i in range(len(geodata['NAME']))]

The problem is that you return a dataframe where the index contains tuples that might have None values. Apparently pandas doesn't like that. So when I was resetting the index it wiped out the index instead of creating a column. My suggestion is to handle this by looking for None values for any field that ends up in the dataframe index and replacing those None values with an empty string like '' or some other placeholder. I realize that there is a balance between wanting to preserve the error since this data is truly missing in ACS and you don't want users to proceed without knowing the data is missing. But from my perspective if this is a concern then use a placeholder that makes this clear like 'Value is missing in Census Table'. Currently, even the good information (zip code) is deleted and the table is unusable.

Here is my fix (used as the next to last line of download function): geoindex = [censusgeo([(key, geodata[key][i] if (not geodata[key][i] is None) else '') for key in geodata if (key != 'NAME')], geodata['NAME'][i]) for i in range(len(geodata['NAME']))]

Again, I love the package and appreciate your investment. All the best.

jtleider commented 3 years ago

Hi, Thanks for your feedback and I'm glad you found a workaround. I'm not replicating the error you ran into for 2017 (assuming 'zip code tabulation area' should be '*' and not ''). Are you still running into this issue? If so, I'll try to think about a workaround using the existing code; I'd rather not change the existing behavior in the main package as that might break things for other users.

Best, Julien

24thronin commented 3 years ago

Hi Julien, I sure hope that the issue wasn't me using '' instead of '*'. I tried to replicate the issue today and I wasn't able to. Your package worked fine today so I haven't got a clue. I agree that you leave the code as is. All the best, John