At this point, the next lowest hanging fruit for speeding up Universe::tick is removing the allocation and free.
Rust does extra bound checks when indexing arrays. I think in simple cases the compiler can optimize it away (e.g. simple loop with a global .len() check) but that's unlikely with the convolution-like operator needed for GoL. Iterators might also lead to better memory access patterns although it's hard to verify.
I played a bit around and got universe.tick() to run ~2x faster than the version without the modulo. Caveat: measured with criterion, not as wasm, only measured for a 1024x768 grid.
I moved this logic to Cell:
impl Cell {
fn evolution(self, live_neighbors: u8) -> Cell {
match (self, live_neighbors) {
// Rule 1: Any live cell with fewer than two live neighbours
// dies, as if caused by underpopulation.
(Cell::Alive, x) if x < 2 => Cell::Dead,
// Rule 2: Any live cell with two or three live neighbours
// lives on to the next generation.
(Cell::Alive, 2) | (Cell::Alive, 3) => Cell::Alive,
// Rule 3: Any live cell with more than three live
// neighbours dies, as if by overpopulation.
(Cell::Alive, x) if x > 3 => Cell::Dead,
// Rule 4: Any dead cell with exactly three live neighbours
// becomes a live cell, as if by reproduction.
(Cell::Dead, 3) => Cell::Alive,
// All other cells remain in the same state.
(otherwise, _) => otherwise,
}
}
}
and replaced tick by:
pub fn tick(&mut self) {
let mut next = self.cells.clone();
let width = self.width as usize;
let height = self.height as usize;
let next_rows = next.chunks_exact_mut(width);
let rows_before = self.cells.chunks_exact(width).cycle().skip(height - 1);
let rows = self.cells.chunks_exact(width);
let rows_after = self.cells.chunks_exact(width).cycle().skip(1);
for (next_row, row_b, row, row_a) in izip!(next_rows, rows_before, rows, rows_after) {
// manually count active nb for first cell in row
let first_count = row_b[0] as u8
+ row_b[1] as u8
+ row_b[row_b.len() - 1] as u8
+ row[1] as u8
+ row[row.len() - 1] as u8
+ row_a[0] as u8
+ row_a[1] as u8
+ row_a[row_a.len() - 1] as u8;
next_row[0] = row[0].evolution(first_count);
// 3 sliding windows (size=3)
let next_cells = next_row.iter_mut().skip(1);
let window_before = row_b.windows(3);
let window = row.windows(3);
let window_after = row_a.windows(3);
for (next_cell, w_b, w, w_a) in izip!(next_cells, window_before, window, window_after) {
let count = w_b.iter().map(|&v| v as u8).sum::<u8>()
+ w[0] as u8
+ w[2] as u8
+ w_a.iter().map(|&v| v as u8).sum::<u8>();
*next_cell = w[1].evolution(count);
}
// manually count active nb for last cell in row
let last_count = row_b[0] as u8
+ row_b[row_b.len() - 2] as u8
+ row_b[row_b.len() - 1] as u8
+ row[0] as u8
+ row[row.len() - 2] as u8
+ row_a[0] as u8
+ row_a[row_a.len() - 2] as u8
+ row_a[row_a.len() - 1] as u8;
next_row[next_row.len() - 1] = row[row.len() - 1].evolution(last_count);
}
self.cells = next;
}
Granted it's not the most readable and there is probably a nicer way but it might be good to remind that Rust array indexing is not zero-cost.
(it's using izip from the itertools crate but could nest calls to zip to avoid extra dependency)
Using iterators instead of indexing is another possible optimization. At the end of the speedup refactoring: https://rustwasm.github.io/docs/book/game-of-life/time-profiling.html
Rust does extra bound checks when indexing arrays. I think in simple cases the compiler can optimize it away (e.g. simple loop with a global
.len()
check) but that's unlikely with the convolution-like operator needed for GoL. Iterators might also lead to better memory access patterns although it's hard to verify.I played a bit around and got
universe.tick()
to run ~2x faster than the version without the modulo. Caveat: measured with criterion, not as wasm, only measured for a 1024x768 grid.I moved this logic to
Cell
:and replaced
tick
by:Granted it's not the most readable and there is probably a nicer way but it might be good to remind that Rust array indexing is not zero-cost.
(it's using
izip
from theitertools
crate but could nest calls to zip to avoid extra dependency)