I am experimenting with using "unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4bit" for inference. I am using 1 A100 GPU with 16 core CPU. However, inference time for one sentence takes 20+ minutes.
Is there any way to speed it up? Also is there anyway to process multiple text input together in a list to speed things up? Something like:
Hi, awesome project!
I am experimenting with using "unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4bit" for inference. I am using 1 A100 GPU with 16 core CPU. However, inference time for one sentence takes 20+ minutes.
Is there any way to speed it up? Also is there anyway to process multiple text input together in a list to speed things up? Something like: