Scheduling of Switch Inside Parallel

mattfel1 commented 6 years ago

Working on this snippet:

              Parallel{
                if (layer_stride == 2) {
                  lb_stride2 load layer_data(layer,ic,row::row+2,0::layer_cols)  
                } else {
                  lb_stride1 load layer_data(layer,ic,row,0::layer_cols)
                }
                accum_line load layer_data(layer+1, oc, row/layer_stride, 0::layer_cols/layer_stride)        
              }

It looks like the unit pipe that is checking the condition is scheduled in parallel with the switch that it should be providing the condition to. Maybe this kind of structure should be disallowed, or the unit pipes shoud be moved in front of the parallel.

dkoeplin commented 6 years ago

Either that or it should become:

Parallel {
  Pipe {
    val temp = Reg[Bit]
    Pipe { temp := (layer_stride == 2) }
    Switch(temp, ...)
  }
  load...
}

But that also seems like it may have some overhead

mattfel1 commented 6 years ago

A tiny example of another failure caused by this issue. I pulled this structure out of SPMV_CRS, which hangs with retime turned on. It is because the size of the load is computed inside the Parallel, so at the time that the load command is issued, the size is still set to 0. It is set to 40 or whatever in the very next cycle when the unit pipe alongside it finishes:

object IndirectLoad extends SpatialApp { // This hangs with retime on in SPMV_CRS
  @virtualize
  def main() {
    val ids = Array.tabulate(16){i => 32*i}
    val data = Array.tabulate(32*16){i => random[Int](5)}
    val id_dram = DRAM[Int](16)
    val data_dram = DRAM[Int](32*16)
    val result_dram = DRAM[Int](32)
    setMem(id_dram, ids)
    setMem(data_dram, data)
    Accel{
      val id_sram = SRAM[Int](16)
      val data_sram = SRAM[Int](32)
      id_sram load id_dram
      Foreach(8 by 1) {i => 
        val start = id_sram(i)
        val end = id_sram(i+1)
        Parallel{
          Pipe{data_sram load data_dram(start::end)}
        }
        result_dram store data_sram 
      }
    }
    val result = getMem(result_dram)
    val gold = Array.tabulate(32){i => data(ids(7) + i)}
    printArray(result, "result")
    printArray(gold, "gold")
    val cksum = gold.zip(result){_==_}.reduce{_&&_}
    println("PASS: " + cksum + " (IndirectLoad)")
  }
}

mattfel1 commented 6 years ago

This is tricky to fix so I added a check in the Controller Sanity Check that throws in a wrench if you have a controller with a dependency on another controller and their lca is a parallel, as a first attempt for at least reporting to the user if one of these bad controller structures exists in the app. We may reopen this issue when we move to token based control flow

stanford-ppl / spatial-lang

Scheduling of Switch Inside Parallel #244