Output data is structured:
{user-specified-output-directory}/{partition-cell}.parquet/part.{n}.parquet
The part.{n}.parquet comes from the original spatial sort (default Hilbert); the same part can then appear in multiple {partition-cell}.parquet subdirectories, but these only include data from the first-pass parts that actually include cells under that partition key (parent cell). This saves a whole cycle of reindexing, which was what was causing OOM errors.
Closes #14
Output data is structured:
{user-specified-output-directory}/{partition-cell}.parquet/part.{n}.parquet
The
part.{n}.parquet
comes from the original spatial sort (default Hilbert); the same part can then appear in multiple{partition-cell}.parquet
subdirectories, but these only include data from the first-pass parts that actually include cells under that partition key (parent cell). This saves a whole cycle of reindexing, which was what was causing OOM errors.