ikawrakow / ik_llama.cpp

llama.cpp fork with additional SOTA quants and improved performance
MIT License
89 stars 6 forks source link

Feature Request: Elliminate/reduce unnecessary copies #67

Open ikawrakow opened 1 month ago

ikawrakow commented 1 month ago

Prerequisites

Feature Description

PR #66 does it for Phi-3(.5)-mini, with a non-negligible performance gain on GPUs. Architectures that could potentially benefit from the same optimization are Falcon, DBRX, Starcoder, Bert, Bloom, MPT, Qwen, Phi-2, GPT-2, Codeshell, OpenLM, GPT-Neox, ChatGLM.

Motivation

Improve performance

Possible Implementation

See #66