stanford-ppl / spatial

Spatial: "Specify Parameterized Accelerators Through Inordinately Abstract Language"
https://spatial.stanford.edu
MIT License
274 stars 32 forks source link

Cut Back on Excessive Wires #75

Closed shadjis closed 6 years ago

shadjis commented 6 years ago

problem_app.txt

Attached is a simple example app which reveals some problems:

  1. If outP = 1 and P2 = 32, it crashes in make vcs with java.lang.OutOfMemoryError
  2. If outP = 2 and P2 = 16, it crashes in make vcs with the same error as #72
  3. If outP = 4 and P2 = 8, it passes but takes long to make vcs. Maybe this long time is expected with these pars but they are not that high so I just wanted to check because the app is not that complicated but it takes long.

All 3 have the same total par I think

mattfel1 commented 6 years ago

I've been staring at this for a while now and I think at least problem # 1 and 2 could be related to an extremely large number of wires getting generated, for example in patterns like

DataAsBits
VecSlice
VecApply
DataAsBits
VecApply
DataAsBits
....
VecApply
VecConcat
BitsAsData

Another source of abundant wires could be from the way vectorized sram reads are split into individual lanes in cg. I'll see what I can do about this one now and then look at if there is a more concise way to capture the first thing

mattfel1 commented 6 years ago

Also, #64 will help this a lot too. Specifically, there is this offending snippet (from issue #72) :

            Foreach(0 until nr, 0 until nc, 0 until B par 16) { (r,c,b) =>
              ...
              val a2: List[Int] = List.tabulate(3){i => List.tabulate(3){j => tmp2(i+r, j+c) }}.flatten
              ...
            }

The Foreach is an outer pipe, parallelized by 16. b is parallelized but r and c are not, so I'm pretty sure this should be broadcasted instead of duplicating the memory x16

We should have the same issue here for the local sram

mattfel1 commented 6 years ago

awaiting #80

mattfel1 commented 6 years ago

80 was merged but there still seems to be some java limitations so leaving this issue open for now until we find either more opportunities for compressing the code or (hopefully) the specialized reduction + broadcasting reads shrinks the code siginificantly

shadjis commented 6 years ago

I'm not sure if this is useful but this gives a different error now

[error] (run-main-0) java.lang.IndexOutOfBoundsException: 32 [error] java.lang.IndexOutOfBoundsException: 32 [error] at scala.collection.immutable.Vector.checkRangeConvert(Vector.scala:132) [error] at scala.collection.immutable.Vector.apply(Vector.scala:122) [error] at chisel3.core.Vec.apply(Aggregate.scala:221)

shadjis commented 6 years ago

The error above is gone after 307fa5b but the original errors are back now

mattfel1 commented 6 years ago

Refactoring chisel gen to hopefully solve this permanently (there still may be a mem limit issue but should be strictly forward progress anyway). See #134