Closed mattfel1 closed 5 years ago
I added cycles stalled / iter (outbound not ready) and cycles idle / iter (inbounds not valid). Still needs to be user-tested to know if this is the right metric to track, but I think it is a good start. It helps to show where the bottle necks are. In the best case, cycles idle / iter should be equal to pipe latency (data was valid all the time, except for when it was fully exhausted and the pipe was drained). Ideally, there would be 0 cycles stalled / iter.
For DRAM, its a little tricky because there is some kind of effective lower bound for these numbers (in a simple dot product test, I see about 200 cycles idle/iter for the stage that catches data from DRAM). Not sure how to incorporate this info.
I think we already have all the components for this lying around, between MAGCore counters, instrumentation counters, and the backpressure/forwardpressure helpers. I think what would be useful is if --instrument helped answer the questions:
1) Is Fringe spending too much time waiting for the Accel to drain/fill the data fifos (i.e. parallelize loads/stores more) 2) Is Fringe having trouble keeping up with the requests generated by the Accel (i.e. distribute your loads/stores better in the app, or use "Par of Pipes vs Pipe of Pars" or "decentralized controllers" flags if/when these options exist) 3) Where are the hotspots in the Accel causing issues 1 and/or 2