mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
22.67k stars 1.72k forks source link

feat: add support to ctransformers #1056

Open mudler opened 11 months ago

mudler commented 11 months ago

Tracker to add support to ctransformers https://github.com/marella/ctransformers

https://github.com/jllllll/ctransformers-cuBLAS-wheels

localai-bot commented 10 months ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

To add support for ctransformers, you can modify the build command in your Dockerfile to include the ctransformers package. You can use the following command to install ctransformers:

RUN pip install ctransformers==3.0.1

Replace 3.0.1 with the version of ctransformers that you want to use. To use ctransforters with cuBLAS, you can modify your model config file to include the following options:

gpu_layers: 1
ffmpeg: true
build_type: 'cublas'
cuda_major_version: 12
cuda_minor_version: 1
platforms: 'linux/amd64'
tag_latest: false
tag_suffix: '-cublas-cuda12-ffmpeg'

Replace the values with your desired settings. To use ctransforters with Metal, you can modify your model config file to include the following options:

gpu_layers: 1
build_type: 'metal'
platforms: 'macos/arm64'
tag_latest: false
tag_suffix: '-metal-cuda12-ffmpeg'

Replace the values with your desired settings. To use ctransforters with OpenBLAS, you can modify your model config file to include the following options:

gpu_layers: 1
build_type: 'openblas'
platforms: 'linux/amd64'
tag_latest: false
tag_suffix: '-openblas-cuda12-ffmpeg'

Replace the values with your desired settings. To use ctransforters with CuBLAS, you can modify your model config file to include the following options:

gpu_layers: 1
build_type: 'cublas'
cuda_major_version: 12
cuda_minor_version: 1
platforms: 'linux/amd64'
tag_latest: false
tag_suffix: '-cublas-cuda12-ffmpeg'

Replace the values with your desired settings. To use ctransforters with CLBlast, you can modify your model config file to include the following options:

gpu_layers: 1
build_type: 'clblas'
platforms: 'linux/amd64'
tag_latest: false
tag_suffix: '-clblas-cuda12-ffmpeg'

Replace the values with your desired settings. Make sure to give enough resources to the running container by setting the appropriate environment variables in your Dockerfile or Docker-compose file. You can use the following command to set the memory limit:

ENV memory_limit=2048m

Replace 2048m with the desired memory limit in megabytes.

Sources: