Why is there no difference in the E2E performance of T-MAC and llama.cpp on arm machine?

microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table

MIT License

588 stars 44 forks source link

Why is there no difference in the E2E performance of T-MAC and llama.cpp on arm machine? #60

Closed ppp-max closed 1 month ago

ppp-max commented 1 month ago

I used a ARM machine to test the end-to-end output, but the performance does not match the results mentioned in the paper. The tested data of llama.cpp and T-MAC is nearly same. I've posted the measured data below. And the frequency of this machine is 2.5 GHz, the bandwidth of this machine 2.6 G/s per core.