vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.2k stars 4.74k forks source link

Add support for ReFT #4413

Open RonanKMcGovern opened 7 months ago

RonanKMcGovern commented 7 months ago

🚀 The feature, motivation and pitch

Motivation is to allow ReFT representations to be applied on the fly during inference, which can be done in a batchwise manner.

this is much faster than applying LoRAs

Alternatives

LoRA is too slow as it requires adapter weights to be added, which increases the number of operations.

Additional context

See https://github.com/stanfordnlp/pyreft/issues/63

chris-aeviator commented 6 months ago

as a user of pyreft I want to highlight the need for selecting subspaces into a hypothetical PyreftRequest (see https://github.com/stanfordnlp/pyreft/issues/63#issuecomment-2073233538)

jvlinsta commented 4 months ago

Any traction on this?

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

chris-brightbeam commented 4 days ago

unstale