incredibly slow on NVIDIA GPU on a Linux cluster

anamariaUIC commented 8 months ago

Hello,

I installed privateGPT using:

salloc --job-name "InteractiveJob" --cpus-per-task 4 --mem-per-cpu 50gb --time 01:00:10 -p batch_gpu --gres=gpu:1

module load Mamba git CUDA && conda create -n gpt && source activate gpt && mamba install python=3.11 && pip install llama-index-readers-file poetry injector && git clone https://github.com/imartinez/privateGPT.git && cd privateGPT/ && export PYTHONPATH=$PYTHONPATH:$PWD && poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"

pip install gradio llama-index llama-index-llms-llama-cpp llama-index-vector-stores-postgres llama-index-vector-stores-qdrant llama-index-embeddings-huggingface && poetry run python scripts/setup && CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python && PGPT_PROFILES=local make run

My NVIDIA GPU shown in attach.

One query takes one hour to be answered please advise.

Thanks

icsy7867 commented 7 months ago

I'm using slurm and srun to acreate an enroot jail of vllm, and then using privgpt to connect to vllm as "openailike" and this seems to work well enough.

Privategpt could be a little snapper, but I am still getting like 10 tokens/sec as put. Initial queries take around 2 seconds.

anamariaUIC commented 6 months ago

Thank you for that. Would you be available to chat a bit further abotu that?

On Apr 11, 2024, at 4:43 PM, icsy7867 @.**@.>> wrote:

CAUTION: External Sender

I'm using slurm and srun to acreate an enroot jail of vllm, and then using privgpt to connect to vllm as "openailike" and this seems to work well enough.

Privategpt could be a little snapper, but I am still getting like 10 tokens/sec as put. Initial queries take around 2 seconds.

— Reply to this email directly, view it on GitHubhttps://github.com/zylon-ai/private-gpt/issues/1805#issuecomment-2050604021, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUHBH5CCKQVZNHQU5UIYHJLY437YLAVCNFSM6AAAAABFNFVZOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJQGYYDIMBSGE. You are receiving this because you authored the thread.Message ID: @.***>

This email originated from outside the University of Illinois System. Use caution when replying, clicking links, or opening attachments. DO NOT reply to any requests asking you to reply from a personal account or SMS.

zylon-ai / private-gpt

incredibly slow on NVIDIA GPU on a Linux cluster #1805