harvard-acc / ALADDIN

A pre-RTL, power-performance model for fixed-function accelerators
Other
161 stars 54 forks source link

Does the memory in ALADDIN support broadcast mechanism? #9

Closed cylinbao closed 7 years ago

cylinbao commented 7 years ago

If an accelerator has multiple PEs and share a local buffer. Does the buffer(memory) in ALADDIN able to support broadcast mechanism that I don't need to partition it to get correct behavior?

What I want to do is like this

int A[2];
int B[4];
int C[4];

for(i=0; i<2; i++)
     for(j=0; j<4; j++)
          C[j] = A[i] + B[j];

I want to unroll the second for-loop with the factor 4, but the data from array A is the same in that for-loop. As my testing result, I still need to partition all of the three arrays (A, B and C) for 4 times to get correct performance behavior. So I want to confirm that ALADDIN support the data broadcasting or not. Thanks.

xyzsam commented 7 years ago

Just save A[i] into a local variable and use that in the innermost for loop.

for (i = 0; i < 2; i++) {
  int a = A[i];
  for (j = 0; j < 4; j++)
     C[j] = a + B[j];
}

The reason this works is because a is represented by a register, which can be concurrently read by as many operations as needed.

cylinbao commented 7 years ago

Ok. I see. But I have another question here, does it have limit for the number of concurrently reads for a register in ALADDIN?

xyzsam commented 7 years ago

Nope - see my last sentence :)

cylinbao commented 7 years ago

Ohoh, I see. Thanks a lot!