We should shuffle around the realization order to minimize peak memory usage

abadams commented 8 months ago

Consider a pipeline with three outputs f2, g2, h2. These call Funcs f1, g1, h1 respectively. Everything is compute_root. The realization order f1 f2 g1 g2 h1 h2 is going to use a lot less intermediate memory than the order f1 g1 h1 f2 g2 h2.

We should shuffle the realization order of realizations at each loop level in schedule_functions to minimize the number of overlapping lifetimes. This could be done by identifying each loop level used in a compute_at, and then for each, coming up with a new realization order for that loop level. This would have to be done at the level of fused groups, not Funcs.

rootjalex commented 7 months ago

This seems like something that should be scheduable, instead of automatic?

steven-johnson commented 7 months ago

This seems like something that should be scheduable, instead of automatic?

Is there ever a situation where we'd choose to use more than the minimum?

abadams commented 7 months ago

It also affects locality, so there might be a trade-off here. Also if the allocations are all dynamic-size, the peak usage and thus the order will depend on those sizes, so the compiler won't be able to infer it.

You can already sort of schedule it with compute_at(Var::outermost(), the_func_you_want_to_go_before)

halide / Halide

We should shuffle around the realization order to minimize peak memory usage #8150