harvard-acc / gem5-aladdin

End-to-end SoC simulation: integrating the gem5 system simulator with the Aladdin accelerator simulator.
BSD 3-Clause "New" or "Revised" License
210 stars 59 forks source link

Questions about the support of fully-coherent caches in gem5-Aladdin #48

Open doitdodo opened 1 year ago

doitdodo commented 1 year ago

Hello. It is said in the readme file that gem5-Aladdin supports three coherence models: non-coherent DMA, LLC-coherent directly access (using ACP), and fully-coherent caches. Meanwhile, in the integration test 'test_load_store', the accelerator uses the private cache to access data from the main memory. (1) Does the 'test_load_store' test belong to the so-called 'fully-coherent caches' model?

In my mind, fully-coherent caches mean that the accelerator's private cache should be coherent with the cpu's private cache (maybe they should be both connected to a shared L2 cache with a coherence protocol). (2) However, gem5-Aladdin directly connects the accelerator's private cache to the membus by default, and how can it be coherent with the cpu's L1 cache?

Then, I have tried to add an L2 cache in 'test_load_store' and modify the aladdin_se.py to re-connect the accelerator's private cache to the L2 cache, making the L2 shared by both the accelerator's and cpu's privated caches. The simulation result seems to make sense, but I don't know whether it is correct. Meanwhile, I find an annotation in lines 248-251 of configs/common/CacheConfig.py https://github.com/harvard-acc/gem5-aladdin/blob/d4efbee56d71f9609eab85393eff58f5dbf7763c/configs/common/CacheConfig.py#L248:

xyzsam commented 1 year ago

(1) Yes. (2) The accelerator is just like a CPU in this regard. It has its own cache. That cache is coherent with all the other caches in the system. If an accelerator wants to modify a value that's owned by CPU 0 (which may currently be in CPU 0's L1 cache), it needs to send an RFO to acquire exclusive ownership of that cacheline first. (3) I probably wrote that comment. What I meant was that most users of Aladdin use it to model loosely attached accelerators, rather than ones that are tightly coupled to the CPU. In that case, it doesn't make as much sense to have a direct channel into the CPU L2 cache. But it's not wrong per se. What you did is perfectly valid. (4) Fully coherent caches means exactly what it sounds like. Any memory access to a cached memory region is kept coherent with the rest of the system. No manual cache flushes or invalidations are necessary to see them. This is clearly different from the non-coherent DMA and ACP interfaces.

doitdodo commented 1 year ago

Thanks a lot for your reply. Your answers are professional and helpful.