Closed rishabh-gurbani closed 1 year ago
can you help me w this issue?
llama-2-7b-chat.ggmlv3.q4_0.bin running this basic model
@rishabh-gurbani Hi, you can try running env_examples/.env.13b_example
or env_examples/.env.7b_gptq_example
models on A100 GPU. Here is a colab example that runs GPTQ model on T4 GPU with 15.9851 tokens/sec.
Do not run ggml models on server because ggml models only runs on CPU (without acceleration), and the server's CPU is super slow.
Alright, will try and get back, thanks!
im running this on a machine with Nvidia A100 but it doesnt seem to make use of the gpu.
System Specs : 4x Nvidia A100 80Gb 540 Gigs of ram
Benchmarks : Initialization time: 0.2208 seconds. Average generation time over 5 iterations: 31.0348 seconds. Average speed over 5 iterations: 5.0459 tokens/sec. Average memory usage during generation: 4435.30 MiB