vineodd / PIMSim

PIMSim is a Process-In-Memory Simulator with the compatibility of GEM5 full-system simulation.
GNU General Public License v3.0
178 stars 85 forks source link

No performance improvement in PIM as compared to cpu #10

Open jasmiine28k opened 4 years ago

jasmiine28k commented 4 years ago

./build/X86/gem5.opt --debug-flags=PIM configs/example/se.py --cpu-type=TimingSimpleCPU --cpu-clock=2GHz --caches --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=2MB --l1d_assoc=4 --l1i_assoc=4 --l2_assoc=8 --enable-pim --pim-type=cpu --kernel-type=adder --num-pim-kernels=16 --mem-size=512MB --coherence-granularity=64B -c ./tests/test-progs/example_pim/test_cpu gives no. of ticks=4356369500

./build/X86/gem5.opt --debug-flags=PIM configs/example/se.py --cpu-type=TimingSimpleCPU --cpu-clock=2GHz --caches --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=2MB --l1d_assoc=4 --l1i_assoc=4 --l2_assoc=8 --enable-pim --pim-type=kernel --kernel-type=adder --num-pim-kernels=16 --mem-size=512MB --coherence-granularity=64B -c ./tests/test-progs/example_pim/test_pim gives no. of ticks=4890508500 Why is there more no. of ticks in case of PIM as compared to CPU?

dxlet commented 4 years ago

But the paper argue that it supports dynamic feedback to decide whether a PIM instruction should be executed in memory by using IPC.If it works,my prior guess may be wrong.

Do you find the corresponding dynamic feedback source code? I can not find it in source code, it may not be implemented.

Sayan3116 commented 4 years ago

./build/X86/gem5.opt –debug-flags=PIM configs/example/se.py –cpu-type=TimingSimpleCPU –cpu-clock=2GHz –caches –l2cache –l1d_size=32kB –l1i_size=32kB –l2_size=2MB –l1d_assoc=4 –l1i_assoc=4 –l2_assoc=8 –enable-pim –pim-type=hybrid –kernel-type=adder –num-pim-kernels=16 –mem-size=512MB –coherence granularity=64B -c tests/test-progs/pim-hello/hellopim

I am trying this code, but it gives me an error that I'm trying to implement Hybrid without first configuring it. Can anyone tell me how to configure it?

m1chaelyeung commented 4 years ago

Guess that it's because of the cache hit rate. The memory-side accesses DRAM everytime although the access process is shorter.If the data locality of the program is high,CPU may get more performance improvement.So it may according to your program.But the paper argue that it supports dynamic feedback to decide whether a PIM instruction should be executed in memory by using IPC.If it works,my prior guess may be wrong. By the way, should "pim-type=cpu" match "num-pim-processors"?

I want to know why you say that pimkernels access DRAM with shorter time, I think pimkernels are also put outside the memory and they access DRAM through membus, am i right?

ChenfengZhao commented 4 years ago

./tests/test-progs/example_pim/test_cpu

May I ask how do you use PIM CPUs and what is the source code of your "example_pim/test_cpu" program?