Issue: GPU is not used with Ctransformers even after specifying gpu_layers parameter

YogeshTembe commented 10 months ago

Issue you'd like to raise.

I have installed langchain and ctransformer using -

pip install langchain
pip install ctransformers[cuda]

I am trying following piece of code -

from langchain.llms import CTransformers
config = {'max_new_tokens': 512, 'repetition_penalty': 1.1, 'context_length': 8000, 'temperature':0, 'gpu_layers':50}
llm = CTransformers(model = "./codellama-7b.Q4_0.gguf", model_type = "llama", gpu_layers=50, config=config)

Here gpu_layers parameter is specified still gpu is not being used and complete load is on cpu. Can someone please point out if there is any step missing.

Suggestion:

No response

dosubot[bot] commented 10 months ago

🤖

Hello,

Thank you for reaching out. From the provided context, it seems that the 'gpu_layers' parameter you're trying to use doesn't directly control the usage of GPU for computations in the LangChain's CTransformers class. Instead, the usage of GPU is controlled by the 'device' parameter.

If 'device' is set to a non-negative integer, it corresponds to the CUDA device id of the GPU to be used. If 'device' is set to -1 (default), the computations will be performed on the CPU. Here is the relevant code:

if device is not None and importlib.util.find_spec("torch") is not None:
    import torch
    cuda_device_count = torch.cuda.device_count()
    if device < -1 or (device >= cuda_device_count):
        raise ValueError(
            f"Got device=={device}, "
            f"device is required to be within [-1, {cuda_device_count})"
        )
    if device < 0 and cuda_device_count > 0:
        logger.warning(
            "Device has %d GPUs available. "
            "Provide device={deviceId} to `from_model_id` to use available"
            "GPUs for execution. deviceId is -1 (default) for CPU and "
            "can be a positive integer associated with CUDA device id.",
            cuda_device_count,
        )

This code is located in libs/langchain/langchain/llms/huggingface_pipeline.py.

So, to use GPU for computations, you should specify the 'device' parameter with the appropriate CUDA device id when initializing the CTransformers class. If you have multiple GPUs, you can choose which one to use by setting 'device' to the corresponding CUDA device id.

Please try this and let me know if it resolves your issue. If you have any other questions or need further clarification, feel free to ask.

Best regards.

Sources

libs/langchain/langchain/llms/huggingface_pipeline.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

YogeshTembe commented 10 months ago

@dosu-bot I am trying to use "Ctransformers" library from langchain and you are referring to "HuggingFacePipeline" library.

Can you please suggest some way to use gpu with "Ctransformers" library in langchain

dosubot[bot] commented 10 months ago

🤖

Hi there,

I'm sorry to hear that you're having trouble with the GPU utilization in the CTransformers class. To help you better, I need a bit more information:

Could you please provide the version of LangChain and CTransformers you're using? You can get this information by running pip show langchain and pip show ctransformers.
What is the GPU model you're using? And could you also provide the CUDA version installed on your system?
Could you please share the error message or any warning you're getting while running the code?
Are you setting the device to GPU anywhere in your code? If yes, could you please share that part of the code?

This information will help me understand the issue better and provide a more accurate solution. Thanks for your cooperation.

Best, Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

YogeshTembe commented 10 months ago

@dosu-bot Yes sure. Here is the required information - 1) langchain version = 0.0.306 , ctransformers version = 0.2.27 2) CUDA version = 12.2 3) There is no error or warning. Just that this code uses cpu and not gpu 4) I am not setting which gpu device to use naywhere because I could not find any ctransformers parameter to do so.

dosubot[bot] commented 10 months ago

🤖

Hello,

Thank you for reaching out. Based on a similar issue in the LangChain repository, it seems that you might need to install the 'accelerate' module to enable GPU support. This module is designed to help with distributed and mixed-precision training, and it can help to ensure that your code is utilizing the GPU as expected.

Here is how you can install it:

pip install accelerate

Once you have installed 'accelerate', you can use it in your code like this:

from accelerate import Accelerator
from langchain.llms import CTransformers

accelerator = Accelerator()

config = {'max_new_tokens': 512, 'repetition_penalty': 1.1, 'context_length': 8000, 'temperature':0, 'gpu_layers':50}
llm = CTransformers(model = "./codellama-7b.Q4_0.gguf", model_type = "llama", gpu_layers=50, config=config)

llm, config = accelerator.prepare(llm, config)

The accelerator.prepare function will ensure that your model and data are on the right device, and it will also handle the mixed-precision training if your GPU supports it.

Please give this a try and let me know if it helps. If you still encounter issues, please provide more details so I can assist you further.

For more information on the 'accelerate' module, you can refer to the official documentation.

Best regards.

Sources

Missing GPU support? HuggingFace model too slow

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

YogeshTembe commented 10 months ago

@dosu-bot The solution provided does not work. Also Accelarate is not part of langchain as well.

from langchain.llms import Accelerate

does not work.

dosubot[bot] commented 10 months ago

🤖

Hello,

Thank you for reaching out. It seems like you're trying to use the GPU with the CTransformers library in LangChain, but it's not working as expected.

From the context provided, it appears that there was a similar issue reported in the LangChain repository (Issue #648). The suggested solution was to install the 'accelerate' module from HuggingFace. However, you mentioned that you've already tried this and it didn't work.

The 'accelerate' module is not a part of LangChain, which is why the import from langchain.llms import Accelerate is not working. The 'accelerate' module is a separate library that needs to be installed and imported separately.

Here's how you can do it:

Install the 'accelerate' library using pip:

pip install accelerate

Import the 'accelerate' library in your code:

from accelerate import Accelerator

Use the 'accelerator' object to prepare your model and optimizer:

accelerator = Accelerator()
model, optimizer = accelerator.prepare(model, optimizer)

Please note that you need to replace 'model' and 'optimizer' with your actual model and optimizer.

If you're still facing issues, could you please provide more details about the error messages you're getting? This will help me understand the problem better and provide a more accurate solution.

Best regards.

Sources

Missing GPU support? HuggingFace model too slow

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

iamsach commented 8 months ago

The accelerate fix worked for me. And now the models are loaded in the GPU, after I passed in the LLM that I have instantiated with CTransformers to the prepare method of Accelerator class

Jeevi10 commented 8 months ago

I tried all of this and still my volatile memory is zero. @lamsach, Can you please provide some details about how did you manage to get it run ?

iamsach commented 8 months ago

I used this @Jeevi10

`from accelerate import Accelerator from langchain.llms import CTransformers

accelerator = Accelerator()

config = {'max_new_tokens': 512, 'repetition_penalty': 1.1, 'context_length': 8000, 'temperature':0, 'gpu_layers':50} llm = CTransformers(model = "./codellama-7b.Q4_0.gguf", model_type = "llama", gpu_layers=50, config=config)

llm, config = accelerator.prepare(llm, config)`

It was running in the GPU after that and I got results pretty fast.

christopher11 commented 7 months ago

🤖

Hello,

Thank you for reaching out. It seems like you're trying to use the GPU with the CTransformers library in LangChain, but it's not working as expected.

From the context provided, it appears that there was a similar issue reported in the LangChain repository (Issue #648). The suggested solution was to install the 'accelerate' module from HuggingFace. However, you mentioned that you've already tried this and it didn't work.

The 'accelerate' module is not a part of LangChain, which is why the import from langchain.llms import Accelerate is not working. The 'accelerate' module is a separate library that needs to be installed and imported separately.

Here's how you can do it:

Install the 'accelerate' library using pip:
pip install accelerate
Import the 'accelerate' library in your code:
from accelerate import Accelerator
Use the 'accelerator' object to prepare your model and optimizer:
accelerator = Accelerator()
model, optimizer = accelerator.prepare(model, optimizer)
Please note that you need to replace 'model' and 'optimizer' with your actual model and optimizer.

If you're still facing issues, could you please provide more details about the error messages you're getting? This will help me understand the problem better and provide a more accurate solution.

Best regards.

Sources

Missing GPU support? HuggingFace model too slow

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

This worked for me as well!

VIGHNESH1521 commented 5 months ago

@dosu-bot

from langchain.llms import CTransformers
from dotenv import find_dotenv, load_dotenv
import box
import yaml
from accelerate import Accelerator

load_dotenv(find_dotenv())

with open('config/config.yml', 'r', encoding='utf8') as ymlfile:
    cfg = box.Box(yaml.safe_load(ymlfile))

accelerator = Accelerator()
def build_llm():
    llm = CTransformers(model=cfg.MODEL_BIN_PATH,
                        model_type=cfg.MODEL_TYPE,
                        config={'max_new_tokens': cfg.MAX_NEW_TOKENS,
                                'temperature': cfg.TEMPERATURE,
                                 'gpu_layers': 50}
                        )

    return llm

In this code, i am not able to utilize the GPU, can somebody please help?

Kev703in commented 5 months ago

@VIGHNESH1521 try adding this:

 llm = CTransformers(model=cfg.MODEL_BIN_PATH,
                    model_type=cfg.MODEL_TYPE,
                    config={'max_new_tokens': cfg.MAX_NEW_TOKENS,
                            'temperature': cfg.TEMPERATURE,
                             'gpu_layers': 50}
                    )
**llm = accelerator.prepare(llm)**

return llm

I just use:
config = {'max_new_tokens': cfg.MAX_NEW_TOKENS,
                            'temperature': cfg.TEMPERATURE,
                             'gpu_layers': 50}
llm = CTransformers(model=cfg.MODEL_BIN_PATH,
                    model_type=cfg.MODEL_TYPE,
                    config= config
                    )
llm,config = accelerator.prepare(llm,config)
return llm

VIGHNESH1521 commented 5 months ago

@VIGHNESH1521 try adding this:

 llm = CTransformers(model=cfg.MODEL_BIN_PATH,
                    model_type=cfg.MODEL_TYPE,
                    config={'max_new_tokens': cfg.MAX_NEW_TOKENS,
                            'temperature': cfg.TEMPERATURE,
                             'gpu_layers': 50}
                    )
**llm = accelerator.prepare(llm)**

return llm

I just use:
config = {'max_new_tokens': cfg.MAX_NEW_TOKENS,
                            'temperature': cfg.TEMPERATURE,
                             'gpu_layers': 50}
llm = CTransformers(model=cfg.MODEL_BIN_PATH,
                    model_type=cfg.MODEL_TYPE,
                    config= config
                    )
llm,config = accelerator.prepare(llm,config)
return llm

@Kev703in I had tried this before posting here, it is not working....

saadjelbini commented 3 months ago

@dosu-bot The solution provided does not work. Also Accelarate is not part of langchain as well.

from langchain.llms import Accelerate

does not work.

Hey, did you find any solution to run the code in GPU? cause i have the same problem that you had

langchain-ai / langchain