Closed ziptron closed 11 months ago
Hi,
I never tried to run it on Google Colab, 15GB should be enough for this model - I can run it locally on 10GB VRAM card (with half of the layers offloaded to CPU). If you are still stuck - do you mind posting your model's section of config.yaml and I will try to reproduce it?
Thanks for responding. I do think this may be a Colab issue, so I'll keep trying today and post results later.
By the way, stupid question, how do you know how many "layers" there are? I've been fiddling with the n_gpu_layers parameter, but I cannot quite understand what that means. Does 50 mean 50% (half), or is that a unit of layers? If you could point me towards some info on that I'd much appreciate it.
Thanks!
It is the absolute number of layers and depends on the actual model architecture. When the model is loaded, in this case using llamacpp, you can see it in the log (see screenshot attached).
So in the example below, the model consists of 43 layers, and 15 were offloaded to GPU. You can then check VRAM usage and adjust n_gpu_layers
accordingly. You potentially will need more memory than it's currently stated, depending on the context length and the embedding model used (which also requires GPU in most cases)
This screen shot made me realize that I am not offloading anything to the GPU. See mine below.
I had some errors while installing (see below). Should I try to resolve these errors you think? Or is there a different way to diagnose why I'm not offloading to the GPU?
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.27.1, but you have requests 2.29.0 which is incompatible.
tensorflow 2.12.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.20.2 which is incompatible.
tensorflow-metadata 1.13.1 requires protobuf<5,>=3.20.3, but you have protobuf 3.20.2 which is incompatible.
torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible.
torchdata 0.6.1 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible.
torchtext 0.15.2 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible.
Successfully installed InstructorEmbedding-1.0.1 XlsxWriter-3.1.2 accelerate-0.19.0 argilla-1.13.3 auto-gptq-0.3.0 backoff-2.2.1 bitsandbytes-0.41.0 chromadb-0.3.26 clickhouse-connect-0.6.8 coloredlogs-15.0.1 cryptography-41.0.2 dataclasses-json-0.5.14 datasets-2.14.2 deprecated-1.2.14 dill-0.3.7 diskcache-5.6.1 einops-0.6.1 fastapi-0.95.1 filetype-1.2.0 gitdb-4.0.10 gitpython-3.1.32 h11-0.14.0 hnswlib-0.7.0 httpcore-0.16.3 httptools-0.6.0 httpx-0.23.3 huggingface-hub-0.16.4 humanfriendly-10.0 langchain-0.0.219 langchainplus-sdk-0.0.20 llama-cpp-python-0.1.77 llama-index-0.6.9 llmsearch-0.1.dev74+g7207a16.d20230801 loguru-0.7.0 lz4-4.3.2 marshmallow-3.20.1 monotonic-1.6 msg-parser-1.2.0 multiprocess-0.70.15 mypy-extensions-1.0.0 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 olefile-0.46 onnxruntime-1.15.1 openai-0.27.8 openapi-schema-pydantic-1.2.4 overrides-7.3.1 pdf2image-1.16.3 pdfminer.six-20221105 peft-0.4.0 posthog-3.0.1 protobuf-3.20.2 pulsar-client-3.2.0 pydeck-0.8.1b0 pympler-1.0.1 pymupdf-1.22.5 pypandoc-1.11 pypdf2-3.0.1 python-docx-0.8.11 python-dotenv-1.0.0 python-magic-0.4.27 python-pptx-0.6.21 pytz-deprecation-shim-0.1.0.post0 requests-2.29.0 rfc3986-1.5.0 rouge-1.0.1 safetensors-0.3.1 sentence-transformers-2.2.2 sentencepiece-0.1.99 smmap-5.0.0 sqlalchemy-1.4.48 starlette-0.26.1 streamlit-1.24.1 threadpoolctl-3.1.0 tiktoken-0.3.3 tokenizers-0.13.3 torch-2.0.0 torchvision-0.15.1 transformers-4.29.2 typer-0.7.0 typing-inspect-0.9.0 tzdata-2023.3 tzlocal-4.3.1 unstructured-0.7.8 uvicorn-0.23.2 uvloop-0.17.0 validators-0.20.0 watchdog-3.0.0 watchfiles-0.19.0 websockets-11.0.3 xxhash-3.3.0 zstandard-0.21.0
WARNING: The following packages were previously imported in this runtime:
[google]
You must restart the runtime in order to use newly installed versions.
Sorry that you are facing problems.
It looks llamacpp was built without GPU support during the installation, that's why you don't see it in the output. Will need to investigate how to enable it in the Colab environment,
On a local GPU-enabled computer, assuming all the prerequisites are installed, llamacpp needs the flags described in https://github.com/ggerganov/llama.cpp#cublas in order to build with GPU support.
In this repository, these flags are set using setvars.sh
before the installation (it is also described in README).
I've created a demo notebook on how to run it on Google Colab (free tier) - https://github.com/snexus/llm-search/blob/main/notebooks/llmsearch_google_colab_demo.ipynb
Wow thanks so much! I tried this out this morning and it works well! I may not have been setting the variables (below) correctly, or at all to be honest.
%env CMAKE_ARGS="-DLLAMA_CUBLAS=on"
%env FORCE_CMAKE=1
Thanks for making this project and for your help.
I am running this in Colab with their free tier GPU (15GB), using WizardLM-13B-1.0.ggmlv3.q5_K_S.bin.
I have been testing this out by generating some random PDFs from Wikipedia articles. I can parse about 50 pdfs and create an index in less than a minute. I then run the 'Interact' part and it quickly loads up the "Enter Question >>" prompt. I can then ask a question, and it seems to start compiling the chain. However, afterwards nothing happens.
The prompt below successful finds the PDF of (https://en.wikipedia.org/wiki/Olive_Edis) in my docs foler, and starts putting the prompt together, but then nothing happens.
My GPU usage remains low (2GB/15GB) and I can wait 30 minutes or longer and nothing else happens.
Any hints on how to diagnose this? What should I expect to happen next?
---- Edit ----
This may be a resource issue with Google Colab. I'm now trying to run a different code all together and its also getting stuck at 2 GB of GPU usage and not actually outputting a result. I will try this again tomorrow.