By running this sub-pass earlier, it can be made simpler because it doesn't need to handle anything DMA related. Also this sub-pass is quite modular and doesn't need to be part of the larger distribution pass. My initial motivation for this factorization was that it makes running aievec passes before the cores are materialized easier (so can run aievec passes on just a single function, instead of 16 cores).
By running this sub-pass earlier, it can be made simpler because it doesn't need to handle anything DMA related. Also this sub-pass is quite modular and doesn't need to be part of the larger distribution pass. My initial motivation for this factorization was that it makes running aievec passes before the cores are materialized easier (so can run aievec passes on just a single function, instead of 16 cores).