stanford-ppl / spatial

Spatial: "Specify Parameterized Accelerators Through Inordinately Abstract Language"
https://spatial.stanford.edu
MIT License
274 stars 32 forks source link

Missed Broadcast Opportunity #200

Closed mattfel1 closed 5 years ago

mattfel1 commented 5 years ago

This access pattern broadcasts (3 duplicates):

      Foreach(A by 1, C by 1 par 3, C by 1, B by 1 par 2){case List(a,c,r,b) => 
        val x = sram(a * A + mux(S == 1, r, r*2)*C + mux(S == 1, c, c*2)) // f(a,r,c)
        reg := reg.value + x
      }

This one does not, when r and b iterators are swapped (6 duplicates):

      Foreach(A by 1, C by 1 par 3, B by 1 par 2, C by 1){case List(a,c,b,r) => 
        val x = sram(a * A + mux(S == 1, r, r*2)*C + mux(S == 1, c, c*2)) // f(a,r,c)
        reg := reg.value + x
      }

I'm pretty sure both are candidates for broadcasting. I think the problem is that we only consider last varying iter and assume each iter happens at its own level in the controller hierarchy. I will look into this more closely though

mattfel1 commented 5 years ago

Updated getUnrolledMatrices in AccessExpansion.scala to rely on all of the iterators used by an access pattern, rather than just looking at the last iter and ignoring unroll lanes that happen inside/later than this iter. I think what this was trying to do was figure out which iterators are invariantWith the access pattern but was not capturing all of them if they existed higher up than the lastIter. Now I filter out unroll-able iterators by those that are used in the access pattern to figure out whether or not a unique sym needs to be created. (SparseVector takes an allIters map that maps syms in the access pattern to the iterators used to compute them). I leave it to lockstep-analysis later on to decide if the iterators used in the access pattern are in lockstep relative to the unrolled controllers higher in the hierarchy, and dephase the iterators accordingly.

Also fixed lockstep check that was being too restrictive. If an outer control is unrolled as MoP then we don't care about what stages come before or after the "child" stage in question. It was correct though for PoM unrolling (which never existed until now anyway)

See BroadcastStressTest for lots of examples of what cases I targetted here.