Open abadams opened 8 months ago
This seems like something that should be scheduable, instead of automatic?
This seems like something that should be scheduable, instead of automatic?
Is there ever a situation where we'd choose to use more than the minimum?
It also affects locality, so there might be a trade-off here. Also if the allocations are all dynamic-size, the peak usage and thus the order will depend on those sizes, so the compiler won't be able to infer it.
You can already sort of schedule it with compute_at(Var::outermost(), the_func_you_want_to_go_before)
Consider a pipeline with three outputs f2, g2, h2. These call Funcs f1, g1, h1 respectively. Everything is compute_root. The realization order f1 f2 g1 g2 h1 h2 is going to use a lot less intermediate memory than the order f1 g1 h1 f2 g2 h2.
We should shuffle the realization order of realizations at each loop level in schedule_functions to minimize the number of overlapping lifetimes. This could be done by identifying each loop level used in a compute_at, and then for each, coming up with a new realization order for that loop level. This would have to be done at the level of fused groups, not Funcs.