oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.52k stars 5.31k forks source link

Trying to load models getting nothing but errors. #479

Closed bbecausereasonss closed 1 year ago

bbecausereasonss commented 1 year ago

Describe the bug

Loaded models into model folder. Tried to select them. Got errors. Confused.

Does this only support .pt models?

"llama-text-generation-webui-1 | Loading Llama-30b HFv2... llama-text-generation-webui-1 | Could not find llama-30b-4bit.pt, exiting... llama-text-generation-webui-1 | Traceback (most recent call last): llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/gradio/routes.py", line 374, in run_predict llama-text-generation-webui-1 | output = await app.get_blocks().process_api( llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1017, in process_api llama-text-generation-webui-1 | result = await self.call_function( llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 835, in call_function llama-text-generation-webui-1 | prediction = await anyio.to_thread.run_sync( llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync llama-text-generation-webui-1 | return await get_asynclib().run_sync_in_worker_thread( llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread llama-text-generation-webui-1 | return await future llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run llama-text-generation-webui-1 | result = context.run(func, *args) llama-text-generation-webui-1 | File "/app/server.py", line 63, in load_model_wrapper llama-text-generation-webui-1 | shared.model, shared.tokenizer = load_model(shared.model_name) llama-text-generation-webui-1 | File "/app/modules/models.py", line 100, in load_model llama-text-generation-webui-1 | model = load_quantized(model_name) llama-text-generation-webui-1 | File "/app/modules/GPTQ_loader.py", line 53, in load_quantized llama-text-generation-webui-1 | exit() llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in call llama-text-generation-webui-1 | raise SystemExit(code) llama-text-generation-webui-1 | SystemExit: None"

and

"llama-text-generation-webui-1 | Can't determine model type from model name. Please specify it manually using --gptq-model-type argument llama-text-generation-webui-1 | Traceback (most recent call last): llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/gradio/routes.py", line 374, in run_predict llama-text-generation-webui-1 | output = await app.get_blocks().process_api( llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1017, in process_api llama-text-generation-webui-1 | result = await self.call_function( llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 835, in call_function llama-text-generation-webui-1 | prediction = await anyio.to_thread.run_sync( llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync llama-text-generation-webui-1 | return await get_asynclib().run_sync_in_worker_thread( llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread llama-text-generation-webui-1 | return await future llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run llama-text-generation-webui-1 | result = context.run(func, *args) llama-text-generation-webui-1 | File "/app/server.py", line 63, in load_model_wrapper llama-text-generation-webui-1 | shared.model, shared.tokenizer = load_model(shared.model_name) llama-text-generation-webui-1 | File "/app/modules/models.py", line 100, in load_model llama-text-generation-webui-1 | model = load_quantized(model_name) llama-text-generation-webui-1 | File "/app/modules/GPTQ_loader.py", line 21, in load_quantized llama-text-generation-webui-1 | exit() llama-text-generation-webui-1 | File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in call llama-text-generation-webui-1 | raise SystemExit(code) llama-text-generation-webui-1 | SystemExit: None"

Is there an existing issue for this?

Reproduction

Load models.

Screenshot

No response

Logs

"Starting webui.."
[+] Building 2.0s (20/20) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                0.0s
 => => transferring dockerfile: 32B                                                                                                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                                                                                                   0.0s
 => => transferring context: 35B                                                                                                                                                                                    0.0s
 => [internal] load metadata for docker.io/continuumio/miniconda3:latest                                                                                                                                            0.6s
 => [auth] continuumio/miniconda3:pull token for registry-1.docker.io                                                                                                                                               0.0s
 => [stage-0  1/14] FROM docker.io/continuumio/miniconda3@sha256:10b38c9a8a51692838ce4517e8c74515499b68d58c8a2000d8a9df7f0f08fc5e                                                                                   0.0s
 => [internal] load build context                                                                                                                                                                                   0.4s
 => => transferring context: 94.34MB                                                                                                                                                                                0.4s
 => CACHED [stage-0  2/14] WORKDIR /app                                                                                                                                                                             0.0s
 => CACHED [stage-0  3/14] RUN apt-get update && apt-get install -y git software-properties-common build-essential gnupg ninja-build dos2unix && apt-get clean                                                      0.0s
 => CACHED [stage-0  4/14] RUN conda install torchvision torchaudio pytorch-cuda=11.7 cuda -c pytorch  -c nvidia/label/cuda-11.7.1 && conda clean -a                                                                0.0s
 => CACHED [stage-0  5/14] COPY requirements.txt /app/requirements.txt                                                                                                                                              0.0s
 => CACHED [stage-0  6/14] RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/requirements.txt                                                                                                      0.0s
 => CACHED [stage-0  7/14] COPY extensions/ /app/extensions/                                                                                                                                                        0.0s
 => CACHED [stage-0  8/14] RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/extensions/google_translate/requirements.txt                                                                          0.0s
 => CACHED [stage-0  9/14] RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/extensions/silero_tts/requirements.txt                                                                                0.0s
 => CACHED [stage-0 10/14] RUN mkdir repositories                                                                                                                                                                   0.0s
 => CACHED [stage-0 11/14] RUN cd repositories && git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa && cd GPTQ-for-LLaMa && git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4                         0.0s
 => [stage-0 12/14] COPY . /app                                                                                                                                                                                     0.2s
 => [stage-0 13/14] COPY ./docker/run.sh /app/run.sh                                                                                                                                                                0.0s
 => [stage-0 14/14] RUN dos2unix /app/run.sh                                                                                                                                                                        0.3s
 => exporting to image                                                                                                                                                                                              0.3s
 => => exporting layers                                                                                                                                                                                             0.3s
 => => writing image sha256:f37daac19ce46ae3f1b552b769c2c0e305185650885e7f78d86441b98fc565c2                                                                                                                        0.0s
 => => naming to docker.io/library/llama-text-generation-webui                                                                                                                                                      0.0s

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
[+] Running 1/1
 - Container llama-text-generation-webui-1  Recreated                                                                                                                                                               0.1s
Attaching to llama-text-generation-webui-1
llama-text-generation-webui-1  | Downloading LLaMa model llama-7b (metadata)
llama-text-generation-webui-1  | Downloading the model to models/llama-7b-hf
llama-text-generation-webui-1  | Downloading file 1 of 6...
100% 427/427 [00:00<00:00, 850kiB/s]
llama-text-generation-webui-1  | Downloading file 2 of 6...
100% 124/124 [00:00<00:00, 255kiB/s]
llama-text-generation-webui-1  | Downloading file 3 of 6...
100% 25.5k/25.5k [00:00<00:00, 1.59MiB/s]
llama-text-generation-webui-1  | Downloading file 4 of 6...
100% 2.00/2.00 [00:00<00:00, 3.00kiB/s]
llama-text-generation-webui-1  | Downloading file 5 of 6...
100% 500k/500k [00:00<00:00, 9.92MiB/s]
llama-text-generation-webui-1  | Downloading file 6 of 6...
100% 141/141 [00:00<00:00, 220kiB/s]
llama-text-generation-webui-1  | Downloading LLaMa model llama-7b (weights)
llama-text-generation-webui-1  | --2023-03-21 20:10:50--  https://huggingface.co/decapoda-research/llama-7b-hf-int4/resolve/main/llama-7b-4bit.pt
llama-text-generation-webui-1  | Resolving huggingface.co (huggingface.co)... 3.83.196.160, 34.202.121.154, 34.203.133.210, ...
llama-text-generation-webui-1  | Connecting to huggingface.co (huggingface.co)|3.83.196.160|:443... connected.
llama-text-generation-webui-1  | HTTP request sent, awaiting response... 302 Found
llama-text-generation-webui-1  | Location: https://cdn-lfs.huggingface.co/repos/98/d7/98d7c1a709a7ae2b2b2dbbc8fa82286eae5bf3bff3f004e7d5843c4344e64b11/b48471adcc7e20542f9cacc348725b4ad36c3321ca2015bbd57d3876302426ee?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-7b-4bit.pt%3B+filename%3D%22llama-7b-4bit.pt%22%3B&Expires=1679686230&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzk4L2Q3Lzk4ZDdjMWE3MDlhN2FlMmIyYjJkYmJjOGZhODIyODZlYWU1YmYzYmZmM2YwMDRlN2Q1ODQzYzQzNDRlNjRiMTEvYjQ4NDcxYWRjYzdlMjA1NDJmOWNhY2MzNDg3MjViNGFkMzZjMzMyMWNhMjAxNWJiZDU3ZDM4NzYzMDI0MjZlZT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2Nzk2ODYyMzB9fX1dfQ__&Signature=dEzdw3Bif7yO1GAzjhQ0qb08AZOKv97y4dyPg-4%7EZks%7EyOizWlyFB8c-mYlhv3TRQdSuDtVmqe0skWiTBSndNn52PRlzqg6bGHstw1fLl6qhMnjA9KymYXNGwnpQ6nKqB3T8hGm1aQX8jE4GNgHs5DiHLGZgAhZf2WilYZq9ekWM-ZxbsfPJKPvYJjzWX23Qf-ReZd0i2LtiG2Aj6gjFiUFHD9p90lqZsGCbBSQCigQNAes2leQ5fBTOzNyILDgpjBaM1VawxRi7hPdJDzl4vMkU34ZYg%7EZOtDZS3Q2pi6JbHeY9fHct6CNKS-jIN7yQ9LW9W6Uf1kjpRnBnmbLI7w__&Key-Pair-Id=KVTP0A1DKRTAX [following]
llama-text-generation-webui-1  | --2023-03-21 20:10:50--  https://cdn-lfs.huggingface.co/repos/98/d7/98d7c1a709a7ae2b2b2dbbc8fa82286eae5bf3bff3f004e7d5843c4344e64b11/b48471adcc7e20542f9cacc348725b4ad36c3321ca2015bbd57d3876302426ee?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-7b-4bit.pt%3B+filename%3D%22llama-7b-4bit.pt%22%3B&Expires=1679686230&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzk4L2Q3Lzk4ZDdjMWE3MDlhN2FlMmIyYjJkYmJjOGZhODIyODZlYWU1YmYzYmZmM2YwMDRlN2Q1ODQzYzQzNDRlNjRiMTEvYjQ4NDcxYWRjYzdlMjA1NDJmOWNhY2MzNDg3MjViNGFkMzZjMzMyMWNhMjAxNWJiZDU3ZDM4NzYzMDI0MjZlZT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2Nzk2ODYyMzB9fX1dfQ__&Signature=dEzdw3Bif7yO1GAzjhQ0qb08AZOKv97y4dyPg-4%7EZks%7EyOizWlyFB8c-mYlhv3TRQdSuDtVmqe0skWiTBSndNn52PRlzqg6bGHstw1fLl6qhMnjA9KymYXNGwnpQ6nKqB3T8hGm1aQX8jE4GNgHs5DiHLGZgAhZf2WilYZq9ekWM-ZxbsfPJKPvYJjzWX23Qf-ReZd0i2LtiG2Aj6gjFiUFHD9p90lqZsGCbBSQCigQNAes2leQ5fBTOzNyILDgpjBaM1VawxRi7hPdJDzl4vMkU34ZYg%7EZOtDZS3Q2pi6JbHeY9fHct6CNKS-jIN7yQ9LW9W6Uf1kjpRnBnmbLI7w__&Key-Pair-Id=KVTP0A1DKRTAX
llama-text-generation-webui-1  | Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 18.67.17.96, 18.67.17.33, 18.67.17.2, ...
llama-text-generation-webui-1  | Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|18.67.17.96|:443... connected.
llama-text-generation-webui-1  | HTTP request sent, awaiting response... 200 OK
llama-text-generation-webui-1  | Length: 3779492377 (3.5G) [application/zip]
llama-text-generation-webui-1  | Saving to: ‘/app/models/llama-7b-4bit.pt’
llama-text-generation-webui-1  |
llama-text-generation-webui-1  |      0K ........ ........ ........ ........  0% 10.7M 5m33s
llama-text-generation-webui-1  |  32768K ........ ........ ........ ........  1% 10.6M 5m32s
llama-text-generation-webui-1  |  65536K ........ ........ ........ ........  2% 12.3M 5m14s
llama-text-generation-webui-1  |  98304K ........ ........ ........ ........  3% 12.6M 5m2s
llama-text-generation-webui-1  | 131072K ........ ........ ........ ........  4% 13.3M 4m51s
llama-text-generation-webui-1  | 163840K ........ ........ ........ ........  5% 13.4M 4m43s
llama-text-generation-webui-1  | 196608K ........ ........ ........ ........  6% 12.8M 4m38s
llama-text-generation-webui-1  | 229376K ........ ........ ........ ........  7% 14.0M 4m31s
llama-text-generation-webui-1  | 262144K ........ ........ ........ ........  7% 14.1M 4m24s
llama-text-generation-webui-1  | 294912K ........ ........ ........ ........  8% 14.0M 4m19s
llama-text-generation-webui-1  | 327680K ........ ........ ........ ........  9% 13.9M 4m14s
llama-text-generation-webui-1  | 360448K ........ ........ ........ ........ 10% 13.5M 4m11s
llama-text-generation-webui-1  | 393216K ........ ........ ........ ........ 11% 13.7M 4m7s
llama-text-generation-webui-1  | 425984K ........ ........ ........ ........ 12% 13.6M 4m4s
llama-text-generation-webui-1  | 458752K ........ ........ ........ ........ 13% 13.3M 4m1s
llama-text-generation-webui-1  | 491520K ........ ........ ........ ........ 14% 13.1M 3m58s
llama-text-generation-webui-1  | 524288K ........ ........ ........ ........ 15% 13.7M 3m55s
llama-text-generation-webui-1  | 557056K ........ ........ ........ ........ 15% 13.4M 3m52s
llama-text-generation-webui-1  | 589824K ........ ........ ........ ........ 16% 12.5M 3m50s
llama-text-generation-webui-1  | 622592K ........ ........ ........ ........ 17% 13.0M 3m48s
llama-text-generation-webui-1  | 655360K ........ ........ ........ ........ 18% 13.7M 3m45s
llama-text-generation-webui-1  | 688128K ........ ........ ........ ........ 19% 13.4M 3m42s
llama-text-generation-webui-1  | 720896K ........ ........ ........ ........ 20% 13.9M 3m39s
llama-text-generation-webui-1  | 753664K ........ ........ ........ ........ 21% 13.9M 3m36s
llama-text-generation-webui-1  | 786432K ........ ........ ........ ........ 22% 13.8M 3m33s
llama-text-generation-webui-1  | 819200K ........ ........ ........ ........ 23% 13.7M 3m30s
llama-text-generation-webui-1  | 851968K ........ ........ ........ ........ 23% 14.0M 3m28s
llama-text-generation-webui-1  | 884736K ........ ........ ........ ........ 24% 14.1M 3m25s
llama-text-generation-webui-1  | 917504K ........ ........ ........ ........ 25% 13.5M 3m22s
llama-text-generation-webui-1  | 950272K ........ ........ ........ ........ 26% 13.9M 3m19s
llama-text-generation-webui-1  | 983040K ........ ........ ........ ........ 27% 13.8M 3m17s
llama-text-generation-webui-1  | 1015808K ........ ........ ........ ........ 28% 13.9M 3m14s
llama-text-generation-webui-1  | 1048576K ........ ........ ........ ........ 29% 13.2M 3m12s
llama-text-generation-webui-1  | 1081344K ........ ........ ........ ........ 30% 11.8M 3m10s
llama-text-generation-webui-1  | 1114112K ........ ........ ........ ........ 31% 11.2M 3m9s
llama-text-generation-webui-1  | 1146880K ........ ........ ........ ........ 31% 11.5M 3m7s
llama-text-generation-webui-1  | 1179648K ........ ........ ........ ........ 32% 13.0M 3m5s
llama-text-generation-webui-1  | 1212416K ........ ........ ........ ........ 33% 14.0M 3m2s
llama-text-generation-webui-1  | 1245184K ........ ........ ........ ........ 34% 13.6M 2m59s
llama-text-generation-webui-1  | 1277952K ........ ........ ........ ........ 35% 13.2M 2m57s
llama-text-generation-webui-1  | 1310720K ........ ........ ........ ........ 36% 13.3M 2m54s
llama-text-generation-webui-1  | 1343488K ........ ........ ........ ........ 37% 13.4M 2m52s
llama-text-generation-webui-1  | 1376256K ........ ........ ........ ........ 38% 13.6M 2m49s
llama-text-generation-webui-1  | 1409024K ........ ........ ........ ........ 39% 13.8M 2m47s
llama-text-generation-webui-1  | 1441792K ........ ........ ........ ........ 39% 13.7M 2m44s
llama-text-generation-webui-1  | 1474560K ........ ........ ........ ........ 40% 13.6M 2m42s
llama-text-generation-webui-1  | 1507328K ........ ........ ........ ........ 41% 13.8M 2m39s
llama-text-generation-webui-1  | 1540096K ........ ........ ........ ........ 42% 13.7M 2m36s
llama-text-generation-webui-1  | 1572864K ........ ........ ........ ........ 43% 13.5M 2m34s
llama-text-generation-webui-1  | 1605632K ........ ........ ........ ........ 44% 13.8M 2m31s
llama-text-generation-webui-1  | 1638400K ........ ........ ........ ........ 45% 13.8M 2m29s
llama-text-generation-webui-1  | 1671168K ........ ........ ........ ........ 46% 13.9M 2m26s
llama-text-generation-webui-1  | 1703936K ........ ........ ........ ........ 47% 13.7M 2m24s
llama-text-generation-webui-1  | 1736704K ........ ........ ........ ........ 47% 13.5M 2m21s
llama-text-generation-webui-1  | 1769472K ........ ........ ........ ........ 48% 13.2M 2m19s
llama-text-generation-webui-1  | 1802240K ........ ........ ........ ........ 49% 13.0M 2m17s
llama-text-generation-webui-1  | 1835008K ........ ........ ........ ........ 50% 14.0M 2m14s
llama-text-generation-webui-1  | 1867776K ........ ........ ........ ........ 51% 13.9M 2m12s
llama-text-generation-webui-1  | 1900544K ........ ........ ........ ........ 52% 14.1M 2m9s
llama-text-generation-webui-1  | 1933312K ........ ........ ........ ........ 53% 14.0M 2m6s
llama-text-generation-webui-1  | 1966080K ........ ........ ........ ........ 54% 13.9M 2m4s
llama-text-generation-webui-1  | 1998848K ........ ........ ........ ........ 55% 13.8M 2m2s
llama-text-generation-webui-1  | 2031616K ........ ........ ........ ........ 55% 13.8M 1m59s
llama-text-generation-webui-1  | 2064384K ........ ........ ........ ........ 56% 14.0M 1m57s
llama-text-generation-webui-1  | 2097152K ........ ........ ........ ........ 57% 14.0M 1m54s
llama-text-generation-webui-1  | 2129920K ........ ........ ........ ........ 58% 14.0M 1m52s
llama-text-generation-webui-1  | 2162688K ........ ........ ........ ........ 59% 13.9M 1m49s
llama-text-generation-webui-1  | 2195456K ........ ........ ........ ........ 60% 14.1M 1m47s
llama-text-generation-webui-1  | 2228224K ........ ........ ........ ........ 61% 13.4M 1m44s
llama-text-generation-webui-1  | 2260992K ........ ........ ........ ........ 62% 13.7M 1m42s
llama-text-generation-webui-1  | 2293760K ........ ........ ........ ........ 63% 13.6M 99s
llama-text-generation-webui-1  | 2326528K ........ ........ ........ ........ 63% 13.8M 97s
llama-text-generation-webui-1  | 2359296K ........ ........ ........ ........ 64% 13.8M 95s
llama-text-generation-webui-1  | 2392064K ........ ........ ........ ........ 65% 13.6M 92s
llama-text-generation-webui-1  | 2424832K ........ ........ ........ ........ 66% 13.6M 90s
llama-text-generation-webui-1  | 2457600K ........ ........ ........ ........ 67% 13.4M 87s
llama-text-generation-webui-1  | 2490368K ........ ........ ........ ........ 68% 13.6M 85s
llama-text-generation-webui-1  | 2523136K ........ ........ ........ ........ 69% 13.9M 83s
llama-text-generation-webui-1  | 2555904K ........ ........ ........ ........ 70% 14.0M 80s
llama-text-generation-webui-1  | 2588672K ........ ........ ........ ........ 71% 13.8M 78s
llama-text-generation-webui-1  | 2621440K ........ ........ ........ ........ 71% 13.9M 75s
llama-text-generation-webui-1  | 2654208K ........ ........ ........ ........ 72% 13.4M 73s
llama-text-generation-webui-1  | 2686976K ........ ........ ........ ........ 73% 13.3M 71s
llama-text-generation-webui-1  | 2719744K ........ ........ ........ ........ 74% 13.2M 68s
llama-text-generation-webui-1  | 2752512K ........ ........ ........ ........ 75% 14.0M 66s
llama-text-generation-webui-1  | 2785280K ........ ........ ........ ........ 76% 13.3M 63s
llama-text-generation-webui-1  | 2818048K ........ ........ ........ ........ 77% 13.4M 61s
llama-text-generation-webui-1  | 2850816K ........ ........ ........ ........ 78% 13.8M 59s
llama-text-generation-webui-1  | 2883584K ........ ........ ........ ........ 79% 13.4M 56s
llama-text-generation-webui-1  | 2916352K ........ ........ ........ ........ 79% 13.9M 54s
llama-text-generation-webui-1  | 2949120K ........ ........ ........ ........ 80% 13.7M 51s
llama-text-generation-webui-1  | 2981888K ........ ........ ........ ........ 81% 13.7M 49s
llama-text-generation-webui-1  | 3014656K ........ ........ ........ ........ 82% 13.6M 47s
llama-text-generation-webui-1  | 3047424K ........ ........ ........ ........ 83% 13.8M 44s
llama-text-generation-webui-1  | 3080192K ........ ........ ........ ........ 84% 13.9M 42s
llama-text-generation-webui-1  | 3112960K ........ ........ ........ ........ 85% 13.9M 40s
llama-text-generation-webui-1  | 3145728K ........ ........ ........ ........ 86% 13.8M 37s
llama-text-generation-webui-1  | 3178496K ........ ........ ........ ........ 87% 13.8M 35s
llama-text-generation-webui-1  | 3211264K ........ ........ ........ ........ 87% 13.4M 32s
llama-text-generation-webui-1  | 3244032K ........ ........ ........ ........ 88% 13.9M 30s
llama-text-generation-webui-1  | 3276800K ........ ........ ........ ........ 89% 13.8M 28s
llama-text-generation-webui-1  | 3309568K ........ ........ ........ ........ 90% 13.8M 25s
llama-text-generation-webui-1  | 3342336K ........ ........ ........ ........ 91% 14.1M 23s
llama-text-generation-webui-1  | 3375104K ........ ........ ........ ........ 92% 14.0M 20s
llama-text-generation-webui-1  | 3407872K ........ ........ ........ ........ 93% 14.1M 18s
llama-text-generation-webui-1  | 3440640K ........ ........ ........ ........ 94% 13.6M 16s
llama-text-generation-webui-1  | 3473408K ........ ........ ........ ........ 94% 13.7M 13s
llama-text-generation-webui-1  | 3506176K ........ ........ ........ ........ 95% 13.9M 11s
llama-text-generation-webui-1  | 3538944K ........ ........ ........ ........ 96% 13.9M 9s
llama-text-generation-webui-1  | 3571712K ........ ........ ........ ........ 97% 13.8M 6s
llama-text-generation-webui-1  | 3604480K ........ ........ ........ ........ 98% 13.8M 4s
llama-text-generation-webui-1  | 3637248K ........ ........ ........ ........ 99% 13.8M 2s
llama-text-generation-webui-1  | 3670016K ........ ........ ....             100% 14.0M=4m27s
llama-text-generation-webui-1  |
llama-text-generation-webui-1  | 2023-03-21 20:15:17 (13.5 MB/s) - ‘/app/models/llama-7b-4bit.pt’ saved [3779492377/3779492377]
llama-text-generation-webui-1  |
llama-text-generation-webui-1  | Fixing LLaMa models (tokenizer)
llama-text-generation-webui-1  | running install
llama-text-generation-webui-1  | /opt/conda/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
llama-text-generation-webui-1  |   warnings.warn(
llama-text-generation-webui-1  | /opt/conda/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
llama-text-generation-webui-1  |   warnings.warn(
llama-text-generation-webui-1  | running bdist_egg
llama-text-generation-webui-1  | running egg_info
llama-text-generation-webui-1  | creating quant_cuda.egg-info
llama-text-generation-webui-1  | writing quant_cuda.egg-info/PKG-INFO
llama-text-generation-webui-1  | writing dependency_links to quant_cuda.egg-info/dependency_links.txt
llama-text-generation-webui-1  | writing top-level names to quant_cuda.egg-info/top_level.txt
llama-text-generation-webui-1  | writing manifest file 'quant_cuda.egg-info/SOURCES.txt'
llama-text-generation-webui-1  | reading manifest file 'quant_cuda.egg-info/SOURCES.txt'
llama-text-generation-webui-1  | writing manifest file 'quant_cuda.egg-info/SOURCES.txt'
llama-text-generation-webui-1  | installing library code to build/bdist.linux-x86_64/egg
llama-text-generation-webui-1  | running install_lib
llama-text-generation-webui-1  | running build_ext
llama-text-generation-webui-1  | building 'quant_cuda' extension
llama-text-generation-webui-1  | Emitting ninja build file /app/repositories/GPTQ-for-LLaMa/build/temp.linux-x86_64-cpython-310/build.ninja...
llama-text-generation-webui-1  | Compiling objects...
llama-text-generation-webui-1  | Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
llama-text-generation-webui-1  | ninja: no work to do.
llama-text-generation-webui-1  | g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib /app/repositories/GPTQ-for-LLaMa/build/temp.linux-x86_64-cpython-310/quant_cuda.o /app/repositories/GPTQ-for-LLaMa/build/temp.linux-x86_64-cpython-310/quant_cuda_kernel.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -L/opt/conda/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/quant_cuda.cpython-310-x86_64-linux-gnu.so
llama-text-generation-webui-1  | creating build/bdist.linux-x86_64/egg
llama-text-generation-webui-1  | copying build/lib.linux-x86_64-cpython-310/quant_cuda.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
llama-text-generation-webui-1  | creating stub loader for quant_cuda.cpython-310-x86_64-linux-gnu.so
llama-text-generation-webui-1  | byte-compiling build/bdist.linux-x86_64/egg/quant_cuda.py to quant_cuda.cpython-310.pyc
llama-text-generation-webui-1  | creating build/bdist.linux-x86_64/egg/EGG-INFO
llama-text-generation-webui-1  | copying quant_cuda.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
llama-text-generation-webui-1  | copying quant_cuda.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
llama-text-generation-webui-1  | copying quant_cuda.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
llama-text-generation-webui-1  | copying quant_cuda.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
llama-text-generation-webui-1  | writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
llama-text-generation-webui-1  | zip_safe flag not set; analyzing archive contents...
llama-text-generation-webui-1  | __pycache__.quant_cuda.cpython-310: module references __file__
llama-text-generation-webui-1  | creating dist
llama-text-generation-webui-1  | creating 'dist/quant_cuda-0.0.0-py3.10-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
llama-text-generation-webui-1  | removing 'build/bdist.linux-x86_64/egg' (and everything under it)
llama-text-generation-webui-1  | Processing quant_cuda-0.0.0-py3.10-linux-x86_64.egg
llama-text-generation-webui-1  | creating /opt/conda/lib/python3.10/site-packages/quant_cuda-0.0.0-py3.10-linux-x86_64.egg
llama-text-generation-webui-1  | Extracting quant_cuda-0.0.0-py3.10-linux-x86_64.egg to /opt/conda/lib/python3.10/site-packages
llama-text-generation-webui-1  | Adding quant-cuda 0.0.0 to easy-install.pth file
llama-text-generation-webui-1  |
llama-text-generation-webui-1  | Installed /opt/conda/lib/python3.10/site-packages/quant_cuda-0.0.0-py3.10-linux-x86_64.egg
llama-text-generation-webui-1  | Processing dependencies for quant-cuda==0.0.0
llama-text-generation-webui-1  | Finished processing dependencies for quant-cuda==0.0.0
llama-text-generation-webui-1  | Loading llama-7b...
llama-text-generation-webui-1  | Loading model ...
llama-text-generation-webui-1  | Done.
llama-text-generation-webui-1  | Loaded the model in 13.27 seconds.
llama-text-generation-webui-1  | Loading the extension "gallery"... Ok.
llama-text-generation-webui-1  | /opt/conda/lib/python3.10/site-packages/gradio/deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
llama-text-generation-webui-1  |   warnings.warn(value)
llama-text-generation-webui-1  | Running on local URL:  http://0.0.0.0:8889
llama-text-generation-webui-1  |
llama-text-generation-webui-1  | To create a public link, set `share=True` in `launch()`.
llama-text-generation-webui-1  | Loading Pygmalion-6b...
llama-text-generation-webui-1  | Can't determine model type from model name. Please specify it manually using --gptq-model-type argument
llama-text-generation-webui-1  | Traceback (most recent call last):
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/routes.py", line 374, in run_predict
llama-text-generation-webui-1  |     output = await app.get_blocks().process_api(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1017, in process_api
llama-text-generation-webui-1  |     result = await self.call_function(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 835, in call_function
llama-text-generation-webui-1  |     prediction = await anyio.to_thread.run_sync(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
llama-text-generation-webui-1  |     return await get_asynclib().run_sync_in_worker_thread(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
llama-text-generation-webui-1  |     return await future
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
llama-text-generation-webui-1  |     result = context.run(func, *args)
llama-text-generation-webui-1  |   File "/app/server.py", line 63, in load_model_wrapper
llama-text-generation-webui-1  |     shared.model, shared.tokenizer = load_model(shared.model_name)
llama-text-generation-webui-1  |   File "/app/modules/models.py", line 100, in load_model
llama-text-generation-webui-1  |     model = load_quantized(model_name)
llama-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 21, in load_quantized
llama-text-generation-webui-1  |     exit()
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
llama-text-generation-webui-1  |     raise SystemExit(code)
llama-text-generation-webui-1  | SystemExit: None
llama-text-generation-webui-1  | Loading GPT4Chan Float16...
llama-text-generation-webui-1  | Can't determine model type from model name. Please specify it manually using --gptq-model-type argument
llama-text-generation-webui-1  | Traceback (most recent call last):
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/routes.py", line 374, in run_predict
llama-text-generation-webui-1  |     output = await app.get_blocks().process_api(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1017, in process_api
llama-text-generation-webui-1  |     result = await self.call_function(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 835, in call_function
llama-text-generation-webui-1  |     prediction = await anyio.to_thread.run_sync(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
llama-text-generation-webui-1  |     return await get_asynclib().run_sync_in_worker_thread(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
llama-text-generation-webui-1  |     return await future
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
llama-text-generation-webui-1  |     result = context.run(func, *args)
llama-text-generation-webui-1  |   File "/app/server.py", line 63, in load_model_wrapper
llama-text-generation-webui-1  |     shared.model, shared.tokenizer = load_model(shared.model_name)
llama-text-generation-webui-1  |   File "/app/modules/models.py", line 100, in load_model
llama-text-generation-webui-1  |     model = load_quantized(model_name)
llama-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 21, in load_quantized
llama-text-generation-webui-1  |     exit()
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
llama-text-generation-webui-1  |     raise SystemExit(code)
llama-text-generation-webui-1  | SystemExit: None
llama-text-generation-webui-1  | Loading Llama Alpaca 7B Q4 Embedded...
llama-text-generation-webui-1  | Can't determine model type from model name. Please specify it manually using --gptq-model-type argument
llama-text-generation-webui-1  | Traceback (most recent call last):
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/routes.py", line 374, in run_predict
llama-text-generation-webui-1  |     output = await app.get_blocks().process_api(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1017, in process_api
llama-text-generation-webui-1  |     result = await self.call_function(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 835, in call_function
llama-text-generation-webui-1  |     prediction = await anyio.to_thread.run_sync(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
llama-text-generation-webui-1  |     return await get_asynclib().run_sync_in_worker_thread(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
llama-text-generation-webui-1  |     return await future
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
llama-text-generation-webui-1  |     result = context.run(func, *args)
llama-text-generation-webui-1  |   File "/app/server.py", line 63, in load_model_wrapper
llama-text-generation-webui-1  |     shared.model, shared.tokenizer = load_model(shared.model_name)
llama-text-generation-webui-1  |   File "/app/modules/models.py", line 100, in load_model
llama-text-generation-webui-1  |     model = load_quantized(model_name)
llama-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 21, in load_quantized
llama-text-generation-webui-1  |     exit()
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
llama-text-generation-webui-1  |     raise SystemExit(code)
llama-text-generation-webui-1  | SystemExit: None
llama-text-generation-webui-1  | Loading Llama Original 13B...
llama-text-generation-webui-1  | Can't determine model type from model name. Please specify it manually using --gptq-model-type argument
llama-text-generation-webui-1  | Traceback (most recent call last):
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/routes.py", line 374, in run_predict
llama-text-generation-webui-1  |     output = await app.get_blocks().process_api(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1017, in process_api
llama-text-generation-webui-1  |     result = await self.call_function(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 835, in call_function
llama-text-generation-webui-1  |     prediction = await anyio.to_thread.run_sync(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
llama-text-generation-webui-1  |     return await get_asynclib().run_sync_in_worker_thread(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
llama-text-generation-webui-1  |     return await future
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
llama-text-generation-webui-1  |     result = context.run(func, *args)
llama-text-generation-webui-1  |   File "/app/server.py", line 63, in load_model_wrapper
llama-text-generation-webui-1  |     shared.model, shared.tokenizer = load_model(shared.model_name)
llama-text-generation-webui-1  |   File "/app/modules/models.py", line 100, in load_model
llama-text-generation-webui-1  |     model = load_quantized(model_name)
llama-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 21, in load_quantized
llama-text-generation-webui-1  |     exit()
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
llama-text-generation-webui-1  |     raise SystemExit(code)
llama-text-generation-webui-1  | SystemExit: None
llama-text-generation-webui-1  | Loading Llama-13b HFv2...
llama-text-generation-webui-1  | Could not find llama-13b-4bit.pt, exiting...
llama-text-generation-webui-1  | Traceback (most recent call last):
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/routes.py", line 374, in run_predict
llama-text-generation-webui-1  |     output = await app.get_blocks().process_api(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1017, in process_api
llama-text-generation-webui-1  |     result = await self.call_function(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 835, in call_function
llama-text-generation-webui-1  |     prediction = await anyio.to_thread.run_sync(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
llama-text-generation-webui-1  |     return await get_asynclib().run_sync_in_worker_thread(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
llama-text-generation-webui-1  |     return await future
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
llama-text-generation-webui-1  |     result = context.run(func, *args)
llama-text-generation-webui-1  |   File "/app/server.py", line 63, in load_model_wrapper
llama-text-generation-webui-1  |     shared.model, shared.tokenizer = load_model(shared.model_name)
llama-text-generation-webui-1  |   File "/app/modules/models.py", line 100, in load_model
llama-text-generation-webui-1  |     model = load_quantized(model_name)
llama-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 53, in load_quantized
llama-text-generation-webui-1  |     exit()
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
llama-text-generation-webui-1  |     raise SystemExit(code)
llama-text-generation-webui-1  | SystemExit: None
llama-text-generation-webui-1  | Loading Llama-30b HFv2...
llama-text-generation-webui-1  | Could not find llama-30b-4bit.pt, exiting...
llama-text-generation-webui-1  | Traceback (most recent call last):
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/routes.py", line 374, in run_predict
llama-text-generation-webui-1  |     output = await app.get_blocks().process_api(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1017, in process_api
llama-text-generation-webui-1  |     result = await self.call_function(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 835, in call_function
llama-text-generation-webui-1  |     prediction = await anyio.to_thread.run_sync(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
llama-text-generation-webui-1  |     return await get_asynclib().run_sync_in_worker_thread(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
llama-text-generation-webui-1  |     return await future
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
llama-text-generation-webui-1  |     result = context.run(func, *args)
llama-text-generation-webui-1  |   File "/app/server.py", line 63, in load_model_wrapper
llama-text-generation-webui-1  |     shared.model, shared.tokenizer = load_model(shared.model_name)
llama-text-generation-webui-1  |   File "/app/modules/models.py", line 100, in load_model
llama-text-generation-webui-1  |     model = load_quantized(model_name)
llama-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 53, in load_quantized
llama-text-generation-webui-1  |     exit()
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
llama-text-generation-webui-1  |     raise SystemExit(code)
llama-text-generation-webui-1  | SystemExit: None
llama-text-generation-webui-1  | Loading LlMa7-4bit Hfv2...
llama-text-generation-webui-1  | Can't determine model type from model name. Please specify it manually using --gptq-model-type argument
llama-text-generation-webui-1  | Traceback (most recent call last):
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/routes.py", line 374, in run_predict
llama-text-generation-webui-1  |     output = await app.get_blocks().process_api(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1017, in process_api
llama-text-generation-webui-1  |     result = await self.call_function(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 835, in call_function
llama-text-generation-webui-1  |     prediction = await anyio.to_thread.run_sync(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
llama-text-generation-webui-1  |     return await get_asynclib().run_sync_in_worker_thread(
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
llama-text-generation-webui-1  |     return await future
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
llama-text-generation-webui-1  |     result = context.run(func, *args)
llama-text-generation-webui-1  |   File "/app/server.py", line 63, in load_model_wrapper
llama-text-generation-webui-1  |     shared.model, shared.tokenizer = load_model(shared.model_name)
llama-text-generation-webui-1  |   File "/app/modules/models.py", line 100, in load_model
llama-text-generation-webui-1  |     model = load_quantized(model_name)
llama-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 21, in load_quantized
llama-text-generation-webui-1  |     exit()
llama-text-generation-webui-1  |   File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
llama-text-generation-webui-1  |     raise SystemExit(code)
llama-text-generation-webui-1  | SystemExit: None

System Info

Docker, Win10x64, 4090
BetaDoggo commented 1 year ago

It looks like the issue is that you're trying to load in 4bit mode but the webui doesn't detect the model type so it fails. Try adding --gptq-model-type LLaMa to your launch arguments.

oobabooga commented 1 year ago

Exactly as @BetaDoggo says. Add --model_type llama to your command.