oturns / geosnap

The Geospatial Neighborhood Analysis Package
https://oturns.github.io/geosnap-guide
BSD 3-Clause "New" or "Revised" License
243 stars 32 forks source link

refactor fips filtering mechanics #338

Open knaaptime opened 2 years ago

knaaptime commented 2 years ago

currently CI tests are failing on windows because the datasets are too large to hold in memory for the CI-provisioned VMs. It's not a bug, since things are working as intended, but the same failure will happen to anyone on a memory-constrained machine (or, say, Binder).

The current design of the DataStore class is basically brute-force. When you ask for a dataset, it will load the whole thing into memory from a parquet file (either local or remote), then filter it down to the subset you require (using the get_* functions). That was by design at first, but now there are good filtering options that can be passsed in the pandas/geopandas read_parquet functions that make it possible to do the subsetting during the file i/o, so that only the necessary data gets loaded into memory. This is a lot more efficient but the filtering syntax is a bit cumbersome, so it will require some serious refactoring

I've got some working code that takes a better approach, but will still need to road test it for awhile