This PR has a batch of updates. The individual commits should be pretty meaningful.
Some changes are minor (using pathlib.Path, dask.delayed rather than dask.bag).
The changes to process_blocks.py are a bit larger. Given that population data aren't available for territories, I decided to model the dataset as two tables. One for population and one for geometries. I also included all the population variables, not just the first.
Finally, I removed the county-level partitioning. The state-level partitioning doesn't result in too-large of blocks.
Feel free to take or leave this PR, but these are what were used to generate the parquet files now in Azure Blob Storage.
This PR has a batch of updates. The individual commits should be pretty meaningful.
Some changes are minor (using pathlib.Path, dask.delayed rather than dask.bag).
The changes to
process_blocks.py
are a bit larger. Given that population data aren't available for territories, I decided to model the dataset as two tables. One for population and one for geometries. I also included all the population variables, not just the first.Finally, I removed the county-level partitioning. The state-level partitioning doesn't result in too-large of blocks.
Feel free to take or leave this PR, but these are what were used to generate the parquet files now in Azure Blob Storage.