manaakiwhenua / vector2dggs

DGGS indexer for vector data
https://pypi.org/project/vector2dggs/
GNU Lesser General Public License v3.0
6 stars 0 forks source link

more efficient parent cell repartioning #15

Closed alpha-beta-soup closed 1 year ago

alpha-beta-soup commented 1 year ago

Closes #14

Output data is structured: {user-specified-output-directory}/{partition-cell}.parquet/part.{n}.parquet

The part.{n}.parquet comes from the original spatial sort (default Hilbert); the same part can then appear in multiple {partition-cell}.parquet subdirectories, but these only include data from the first-pass parts that actually include cells under that partition key (parent cell). This saves a whole cycle of reindexing, which was what was causing OOM errors.