Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
I am using Collab Pro, running on the GPU, executing the following code to ask a question, and responding for 50 seconds, which is too slow. Is there any way to accelerate?
`prompt = get_prompt("Please help me explain the TCP handshake")
I am using Collab Pro, running on the GPU, executing the following code to ask a question, and responding for 50 seconds, which is too slow. Is there any way to accelerate?
`prompt = get_prompt("Please help me explain the TCP handshake")
res = llama2_wrapper(prompt)
print(res)`