In a MazeDataset, each connection list is a separate array. This is fine, but it means that:
we have a lot of files inside the zanj zip file
we spend a lot of time loading these files (I think? need some analysis on this)
it would be nice if when saving a MazeDataset to disk, we could concatenate all the arrays and store this as a single array. This should be possible with the current implementation of zanj, although will require some janky stuff in the loader of MazeDataset.
This should still work fine for MazeDatasetCollections (once those are fixed, see #7) since those just contain a list of maze datasets.
this is being worked on in #26 under the add-faster-saving-loading branch
TODOs:
[x] finish saving/loading implementation, under serialize_minimal and load_minimal
[x] store & load configs as-is
[x] ensure generation metadata is collected, stripped from individual mazes, and stored
[x] compute a max_solution_len to figure out dims
[x] store connection lists in an array of shape (n_mazes, 2, grid_n, grid_n)
[x] store solutions in an array of shape (n_mazes, max_solution_len, 2)
[x] store solution lengths in an array of shape (n_mazes, 1) -- later replace this by using placeholder vals
maybe rename these to "fast"?
[x] add saving/loading roundtrip tests for these new functions
[ ] make a notebook which benchmarks saving/loading using both old and new strategies
[ ] across a few different maze types (dfs & forkless dfs should be enough)
In a
MazeDataset
, each connection list is a separate array. This is fine, but it means that:it would be nice if when saving a
MazeDataset
to disk, we could concatenate all the arrays and store this as a single array. This should be possible with the current implementation of zanj, although will require some janky stuff in the loader ofMazeDataset
.This should still work fine for
MazeDatasetCollection
s (once those are fixed, see #7) since those just contain a list of maze datasets.this is being worked on in #26 under the
add-faster-saving-loading
branchTODOs:
serialize_minimal
andload_minimal
max_solution_len
to figure out dims(n_mazes, 2, grid_n, grid_n)
(n_mazes, max_solution_len, 2)
(n_mazes, 1)
-- later replace this by using placeholder vals