harvard-acc / gem5-aladdin

End-to-end SoC simulation: integrating the gem5 system simulator with the Aladdin accelerator simulator.
BSD 3-Clause "New" or "Revised" License
210 stars 59 forks source link

systolic-array: Fix access size in scratchpad. #44

Closed yaoyuannnn closed 2 years ago

yaoyuannnn commented 2 years ago

The size of a write request sent from a commit unit can be smaller than the line size in the scratchpad for a small PE array configuration. This bug can lead to wrong outputs and also memory corruptions that eventually cause simulation failure.

yaoyuannnn commented 2 years ago

This is the first PR to fix #43. A few more will be uploaded.

yaoyuannnn commented 2 years ago

Thanks for the fix! Don't want to block you, but a few things:

  1. Do we expect the program to segfault without this fix, and if so, is it immediate or a memory corruption bug that eventually causes failure? Or does it just produce wrong output? Please include a description/mention of what the failure mode looks like in this commit.
  2. I think we have an existing set of tests for the systolic array - it would be good to add another one to catch this bug in the future. I know you're busy so I'll leave it to your best judgement on how long it will take to do this; you can punt on it for now if it will take too long. Just be aware that generally when we postpone testing for the future, it doesn't ever happen :)

Thanks Sam!

  1. Will do tonight.
  2. Totally agree :) I think I just need a reference convolution implementation to provide correct results, which shouldn't be too hard (I wish I could use the one in SMAUG directly in here :) )
xyzsam commented 2 years ago

I wish I could use the one in SMAUG directly in here

Another option is to just add a SMAUG test using a systolic array setup that would trigger this bug. We'd see the failure in SMAUG instead of here, which isn't ideal, but better than nothing. Up to you.

yaoyuannnn commented 2 years ago

I wish I could use the one in SMAUG directly in here

Another option is to just add a SMAUG test using a systolic array setup that would trigger this bug. We'd see the failure in SMAUG instead of here, which isn't ideal, but better than nothing. Up to you.

Good point. Let me try the good solution first - I will try adding a reference implementation here.