Closed Ray0907 closed 5 months ago
🤖
Hey @Ray0907, great to see you back! Hope you're doing well amidst all the coding adventures. Let's tackle this new challenge together.
Based on the information you've provided and the similar issue I found in the LlamaIndex repository, it seems that the CUDA error you're encountering might be due to the max_length
parameter in the HuggingFaceEmbedding
model constructor.
In the issue I found, the user was able to resolve a similar CUDA error by setting max_length=256
in the constructor of the embedding model. This adjustment might help in your case as well.
The max_length
parameter in the HuggingFaceEmbedding
model constructor is used to define the maximum length of the input that will be passed to the tokenizer. If the input exceeds this length, it will be truncated to fit within the specified max_length
. This is important as it ensures that the input fits within the model's limitations and doesn't cause any errors during the tokenization or model inference process.
Here's how you can adjust your code:
Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-mpnet-base-v2", max_length=256)
As for the device_map
parameter in the HuggingFaceLLM
constructor, it's used to specify the device where the model will be loaded. If it's set to "auto", the model will be loaded on the GPU if one is available, otherwise it will be loaded on the CPU. This parameter interacts with CUDA by determining whether the model will be loaded on the GPU (which uses CUDA for computations) or the CPU.
I hope this helps! If you continue to encounter issues, please provide more details about your environment (e.g., CUDA version, PyTorch version, GPU model) so we can better assist you.
The error is in the embeddings. Probably some max length issue
Try this
Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-mpnet-base-v2", max_length=512)
@logan-markewich its worked. thanks for help.but new question is how can I get longer response. Is any paramete setting could i find in the docs?
Bug Description
I tried to use sample code from the doc sample However, I encountered the error mentioned below. RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Version
0.10.12
Steps to Reproduce
Here's my code.
Relevant Logs/Tracbacks