makepath / census-parquet

Python tools for creating Parquet files from 2020 Census Data
MIT License
16 stars 4 forks source link

Updates #3

Closed TomAugspurger closed 2 years ago

TomAugspurger commented 3 years ago

This PR has a batch of updates. The individual commits should be pretty meaningful.

Some changes are minor (using pathlib.Path, dask.delayed rather than dask.bag).

The changes to process_blocks.py are a bit larger. Given that population data aren't available for territories, I decided to model the dataset as two tables. One for population and one for geometries. I also included all the population variables, not just the first.

Finally, I removed the county-level partitioning. The state-level partitioning doesn't result in too-large of blocks.

Feel free to take or leave this PR, but these are what were used to generate the parquet files now in Azure Blob Storage.