subdgtl / WFC

Prototype tools for playing with Wave Function Collapse
The Unlicense
5 stars 1 forks source link

More cache friendly storage of the World grid #12

Open yanchith opened 3 years ago

yanchith commented 3 years ago

The World Slots are currently stored in a flat 3D array where only moving in the X dimension visits memory adjacent to the currently loaded one, and moving in Y or Z dimensions visits possibly faraway locations.

WFC propagations are depth-first and crawl the grid topologically.

We could be more cache friendly by changing the storage layout to chunks. We could store the World in NxNxN chunks (with N being a small integer) where Slots within the same chunk are stored close to each other in memory. This will mostly help to pre-warm the cache for the current chunk and we probably won't be able to fit an entire chunk into a cache line, because the current 256-bit bit vectors are already too large (32 bytes) and only two of them fit in an x64 cache line. However, even pre-warming the cache could increase the likelihood of cache hits.

As a quick experiment, we could time the algorithm with a changed search order. Currently it first tries searching left and right, which is the most cache friendly direction. By changing this to up and down, we should see a drop in speed. If we don't, we are unlikely to see an increase in speed by changing the storage architecture.

yanchith commented 2 years ago

Morton codes could help here https://en.wikipedia.org/wiki/Z-order_curve, but it looks like we'd have to figure a few things out.

As they currently are, they can only index into squares/cubes, but not any rects/boxes. We'd have to make modifications so that we don't have to densely allocate all the space they can index into, but instead snap into powers of two, e.g. if the user requests a 100x100x3 grid, we only want to allocate 128x128x4, etc.

To do that, we'd also likely have to be parametric over the coordinate order (XYZ, XZY, ZXY, ZYX, YXZ, YZX), because the curve has a preferred direction to expand its dimensions. For vanilla 2D Morton codes it first expands X, then Y, so dimensions that are fine are: 1x1, 2x1, 2x2, 4x2, 4x4, 8x4, 8x8, 16x8, 16x16, etc... Also in this vanilla form, the dimensions can be at most one power of two apart.

We'd also need to see if we can distribute the bits irregularly, e.g. if we have a budget of 32 bits for one index, can we do 15+15+2 and have a stretched Z-order curve?

If some of these prove too hard to work around, or too much of a penalty, we can just fall back on the old indexing scheme and give guidelines to use power-of-two worlds with similar dimensions, etc..

yanchith commented 2 years ago

Prototype in branch https://github.com/subdgtl/WFC/tree/morton.

Results (comparing against branch morton-compare):

ruleset: pipes
--------------
module count: 5
rule count: 39

world size: 64 64 64

old indexing: 0m5.271s
z-curve indexing: 0m3.722s    (improvement of 29.3%)

ruleset: mondieu
----------------
module count: 46
rule count: 3288

world size: 32 32 32

old indexing: 0m10.413s
z-curve indexing: 0m9.342s    (improvement of 10.2%)

ruleset: mondieu
----------------
module count: 46
rule count: 3288

world size: 64 64 64

old indexing: 2m44.061s
z-curve indexing: 2m14.872s    (improvement of 17.7%)

Looks like the improvement is more significant both for smaller rulesets and larger worlds.

Looks promising, if we can resolve questions from above comment.