oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.63k stars 5.31k forks source link

MLC-LLM as a backend? #3537

Closed Ph0rk0z closed 1 year ago

Ph0rk0z commented 1 year ago

Description

https://github.com/mlc-ai/mlc-llm

https://mlc.ai/mlc-llm/docs/index.html

MLC supports a lot of other hardware and has decent AMD support. Apparently their speeds are fast and they have their own quantization?

It can do AMD/Nvidia through vulkan and the native functions (cuda/rocm) and supports intel GPUs as well. People claim good speeds.

Alek32x commented 1 year ago

That would be great for users with lower PC specs. Apache TVM seems promising.

Btw, I saw a similar request at Kobold: https://github.com/KoboldAI/KoboldAI-Client/issues/324

Ph0rk0z commented 1 year ago

If they just use TVM without modifications maybe that is the real backend.

LowYieldFire commented 1 year ago

These benchmarks look very promising https://github.com/mlc-ai/mlc-llm/issues/15

toxuin commented 1 year ago

Seems like it allows splitting model across Nvidia/AMD/Intel cards simultaneously when using Vulkan. Niche use case, but quite unique.

Ph0rk0z commented 1 year ago

Finally mixing cards without having to use .cpp is definitely something. Not having to patch rocm for older AMD, etc etc.

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

KhazAkar commented 1 year ago

Vulkan support would be really great, just like gpt4all did it at least

Ph0rk0z commented 1 year ago

mlc are claiming almost 40 t/s on a split 70b with 4090s. i should try it at some point since that is double and I don't think it's the 3090/4090 difference.

zaggynl commented 1 year ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@toxuin @KhazAkar @Ph0rk0z @Alek32x @LowYieldFire

Requesting a reopen, MLC-LLM is very easy to use for those with an AMD GPU, only needs an excellent frontend like text-generation-webui!

ghost commented 7 months ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@toxuin @KhazAkar @Ph0rk0z @Alek32x @LowYieldFire

Requesting a reopen, MLC-LLM is very easy to use for those with an AMD GPU, only needs an excellent frontend like text-generation-webui!

Would support a reopen too! MLC-LLM is a great project and being able to use it with webui would be a huge quality of life improvement for AMD GPU users!

Snogard commented 5 months ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@toxuin @KhazAkar @Ph0rk0z @Alek32x @LowYieldFire

Requesting a reopen, MLC-LLM is very easy to use for those with an AMD GPU, only needs an excellent frontend like text-generation-webui!

It would be amazing to have this.

KhazAkar commented 5 months ago

Please reopen this issue, since it's not definitely completed.

Vite13371337 commented 4 months ago

I would also like vulkan suport to be added and this request to be reopened. I am not entirely sure but this Project uses llama.cpp as one launcher. Wouldn't it be easy to implement a llama.cpp(and other launcher with vulkan support) specific argument that enables its vulkan support?

Requesting a reopen, MLC-LLM is very easy to use for those with an AMD GPU, only needs an excellent frontend like text-generation-webui!

GPUs without proper support like those single boards with powerfull GPUs, older cards and those with special drivers like nouveau would also greatly benefit.

dproworld commented 1 month ago

LM Studio has this now, how come its still not implemented? Its in llama.cpp!