Closed vipulgote1999 closed 11 months ago
🤖
Hello,
The error you're encountering is likely due to the tensor_parallel_size
argument in the VLLM
class being set to a value greater than the number of GPUs available on your Kaggle Kernel's VM. The tensor_parallel_size
argument specifies the number of GPUs to use for distributed tensor-parallel inference. If the Kaggle Kernel's VM only has one GPU, setting tensor_parallel_size=2
would cause an error because there aren't enough GPUs available to distribute the workload.
To resolve this issue, you should set tensor_parallel_size
to a value less than or equal to the number of GPUs available on your VM. If you're unsure of the number of GPUs on your VM, you can use the torch.cuda.device_count()
function from the PyTorch library to check.
Here's an example of how to do this:
import torch
from langchain.llms import VLLM
num_gpus = torch.cuda.device_count()
llm = VLLM(model="mosaicml/mpt-30b",
tensor_parallel_size=num_gpus,
trust_remote_code=True, # mandatory for hf models
)
llm("What is the future of AI?")
In this code, num_gpus
is the number of GPUs available on your VM. The tensor_parallel_size
argument is set to num_gpus
, ensuring that you're not trying to use more GPUs than are available.
For more information, you can refer to the VLLM notebook in the LangChain repository.
I hope this helps! Let me know if you have any other questions.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Still it got auto killed
Hi, @vipulgote1999! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, the issue you reported was related to setting tensor_parallel_size=2
in the VLLM from langchain.llms. It seems that the error you encountered was due to tensor_parallel_size
being set to a value greater than the number of GPUs available on the Kaggle Kernel's VM. A suggested solution was to set tensor_parallel_size
to a value less than or equal to the number of GPUs available. However, you mentioned that the issue still persists and provided a screenshot of the error.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on this issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.
Thank you for your contribution to the LangChain repository, and please let us know if you have any further questions or concerns!
System Info
VLLM from langchain gives the below error and stops executing:
code: `python from langchain.llms import VLLM
llm = VLLM(model="facebook/opt-125m" tensor_parallel_size=2, trust_remote_code=True, # mandatory for hf models max_new_tokens=128, top_k=10, top_p=0.95, temperature=0.8, )
print(llm("What is the capital of France ?")) ` gives the below error when setting tensor_parallel_size=2 and runs successfully if we comment out tensor_parallel_size argument: error => 2023-08-23 08:52:55,683 ERROR services.py:1207 -- Failed to start the dashboard, return code -11 2023-08-23 08:52:55,685 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is. 2023-08-23 08:52:55,687 ERROR services.py:1276 -- The last 20 lines of /tmp/ray/session_2023-08-23_08-52-52_882632_28/logs/dashboard.log (it contains the error message from the dashboard): 2023-08-23 08:52:55,607 INFO head.py:242 -- Starting dashboard metrics server on port 44227
2023-08-23 08:52:56,847 INFO worker.py:1636 -- Started a local Ray instance.
VM Details: CPU : 4 Core RAM : 13 GB GPU: Nvidia T4 *2 Instance : Kaggle Kernal
Who can help?
No response
Information
Related Components
Reproduction
code:
`python from langchain.llms import VLLM
llm = VLLM(model="facebook/opt-125m" tensor_parallel_size=2, trust_remote_code=True, # mandatory for hf models max_new_tokens=128, top_k=10, top_p=0.95, temperature=0.8, )
print(llm("What is the capital of France ?")) `
On Kaggle Kernal
VM Details: CPU : 4 Core RAM : 13 GB GPU: Nvidia T4 *2 Instance : Kaggle Kernal
Expected behavior
2023-08-23 08:52:55,683 ERROR services.py:1207 -- Failed to start the dashboard , return code -11 2023-08-23 08:52:55,685 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is. 2023-08-23 08:52:55,687 ERROR services.py:1276 -- The last 20 lines of /tmp/ray/session_2023-08-23_08-52-52_882632_28/logs/dashboard.log (it contains the error message from the dashboard): 2023-08-23 08:52:55,607 INFO head.py:242 -- Starting dashboard metrics server on port 44227
2023-08-23 08:52:56,847 INFO worker.py:1636 -- Started a local Ray instance.
and Cell stops running i.e. execution stops