oobabooga commented 1 year ago

This thread is dedicated to discussing the setup of the webui on Intel Arc GPUs.

You are welcome to ask questions as well as share your experiences, tips, and insights to make the process easier for all Intel Arc users.

Jacoby1218 commented 1 year ago

OK, so some notes:

In my testing, CLBlast is quite slow when compared to CUDA or ROCm when used with llama.cpp (I'm not using llama.cpp-python as it simply refuses to use the GPU no matter what I do, despite being built with OpenCL support, but with koboldcpp I get ~1.7t/s with a 13b model.)
None of the other backends work right now, but maybe they could work if IPEX is used (hopefully, it's as simple as Intel says it is. It'd still require custom versions of each script though.) Given the fact that IPEX is now supported on Windows (haven't tested that yet, but will with SDUI) maybe it would be worth it to see if that could work.

simonlui commented 1 year ago

Sorry, this is going to be a somewhat long post. I've been looking into this a bit but unlike with other areas of user-facing ML, the LLM community vs others communities involved in user-facing ML seems to have a lot more limited options in what can be used to get anything Intel working easily at full speed. For example, in the image generation space, it's much easier to slot in Intel's Extension for Pytorch (IPEX) because everyone is using Pytorch directly one way or another in the software projects and the extension is designed and intended to be pretty easy to insert into a project already using Pytorch. In stark comparison, backends in the LLM space do not use Pytorch directly, there's a lot of lower level programming going into C/C++ and custom libraries and model deployment due to performance considerations and the RAM needed to load these projects which were all but unavailable for the average consumer to acquire. So this means there is no "easy" option to slot in something which would make things easy with the lack of something like Pytorch in the picture.

That wouldn't really be a problem if there is a lower level solution. However, we get into the real main issue which is that Intel is not taking the same path as AMD when it comes to CUDA compatibility. They have a different strategy they have been approaching with regards to what they have been doing as a hardware company for the last couple of years. They consolidated all their software and have unified them under something called oneAPI which is their intention to write something once and deploy everywhere in their ecosystem. That goes from anything higher level like Intel's Extension for Pytorch/TensorFlow to middleware libraries like oneMKL/oneDNN all the way down to Intel's compilers and runtime.

As a result, there is nothing like HIP which Intel is providing to anyone (There is a community project called chipStar trying to take that approach but it still seems too early and when I tried it, it isn't ready to try and even start tackling complex projects). What Intel intends is for people to port their software directly from CUDA into SYCL, a Khronos standard that basically is like OpenCL but with C++ instead of which they had provided an automatic tool here to port over CUDA code. The intention is that the output of the conversion can then with very little effort be modified to support their SYCL extensions with DPC++ and pulling in their libraries which interface with SYCL and then this would be able to target everything Intel from CPU to GPU to FPGAs to custom hardware AI accelerators and etc. SYCL then either will get compiled down to Level-Zero, which is the actual API that will run on Intel's devices or it can compile into AMD ROCm and Nvidia's CUDA too which was announced by Codeplay last year. And as fallback, it will compile to OpenCL which everyone supports.

As a result of the above, I would say that it would take some serious effort to get Intel GPUs working at the moment at full speed for anything. That is not to say it is impossible, but it would take either a new software project to make a backend or some sort of large patch to existing backends to make it happen. It's not like I don't see where Intel's coming from and if their vision actually works, things wouldn't be as difficult to deal with given a possible "write once run anywhere" approach. But as is at the moment, it's not tested enough for people to make that effort and it is very incompatible with CUDA and ROCm efforts even if the APIs roughly do the same thing. Using OpenCL if we're talking about Intel GPUs will get users about roughly halfway but it will never be as optimized as CUDA/ROCm and the extra effort needed to get that last portion of optimization even if CLBLast tomorrow can optimize their existing OpenCL code to run on Intel GPUs is a pretty dim prospect in my opinion. I have no clue what can be done about that in a planned fashion but that seems to be the situation at the moment.

Jacoby1218 commented 1 year ago

it appears that HF Transformers might support XPU now https://github.com/huggingface/transformers/pull/25714 which would mean that even if nothing else works, this might. (no quants because no bitsandbytes, but that's also being worked on it seems here: https://github.com/TimDettmers/bitsandbytes/pull/747)

oobabooga commented 1 year ago

I have added an "Intel Arc" option to the one-click installer that installs the appropriate Pytorch version: https://github.com/oobabooga/text-generation-webui/commit/0306b61bb09fb2b5d7b42d02e90267ff62ad3173

The question is if it works when you try to generate text. A good small model for testing is GALACTICA 125M loaded through the transformers loader.

simonlui commented 1 year ago

I have added an "Intel Arc" option to the one-click installer that installs the appropriate Pytorch version: 0306b61

The question is if it works when you try to generate text. A good small model for testing is GALACTICA 125M loaded through the transformers loader.

Keep in mind Windows native does not work yet because Intel botched their release process and I suspect most people wanting to try would have that. So only Linux and WSL 2 for now. Windows also doesn't support Ahead of Time compilation in earlier version of the Windows Pytorch pip package too which makes running the first pass of anything painful. See https://github.com/intel/intel-extension-for-pytorch/issues/398 and https://github.com/intel/intel-extension-for-pytorch/issues/399 for more information.

Jacoby1218 commented 1 year ago

Intel always manages to botch something when it comes to Arc, so not surprised. Will test this out once i get my WSL2 install back working again.

Jacoby1218 commented 1 year ago

I have added an "Intel Arc" option to the one-click installer that installs the appropriate Pytorch version: 0306b61

The question is if it works when you try to generate text. A good small model for testing is GALACTICA 125M loaded through the transformers loader.

This doesn't work, it checks if CUDA is available and then uses the CPU, rather than trying the extension. Also, a good idea would be to call "source /opt/intel/oneapi/setvars.sh" from the script, to auto-initialize the oneAPI environment. Otherwise, users might not get it working and wouldn't be able to figure out why.

simonlui commented 1 year ago

For now, it seems like there are now unofficial Windows PIP packages available here that address both the issues I stated above from one of the WebUI contributors for getting IPEX working optimally on Windows natively. Install at your own risk knowing they are not from Intel and not official.

ashunaveed commented 1 year ago

intel extension for pytorch supports one version of pytorch and if we change to it in the one click installer file, it is downloading but when as per requirements file, the code is downloading same requirements file which is overwriting the exisiting supporting file and the system is unable to use the intel gpu. can anyone provide a get around to this problem. We need to check which pytorch version is compatible with the intel extension for pytorch module and download those versions only.

Daroude commented 1 year ago

Changed the one_click.py so that it downloads and installs the (hopefully) correct pytorch and torch packages, and created a requiremts.txt which may or may not be correct, for Intel Arc since there was none and also added the calls for them in one_click.py.

It downloads and installs the packages but I am stuck at Installing extensions requirements. As soon as this part starts it seems to swtich back to CPU (!?) and installs nvidia packages and uninstalls the intel torch versions.

Update: it looks like the requirements from the various extensions subfolders request the nvidia packages as dependencies for the required packages.

simonlui commented 1 year ago

New all-in-one Pytorch for Windows packages are available here which is preferable to the other packages I linked earlier as they had dependencies which couldn't easily be satisfied without a requirements.txt detailing them. There does seem to be a bug in the newest Windows drivers as seen in https://github.com/intel/intel-extension-for-pytorch/issues/442, you have to revert to something older than version 4885. Version 4676 here is recommended as that was what was used to build the pip packages.

Daroude commented 1 year ago

Wouldn't it be easiest to make an option to compile llama.cpp with CLBlast?

fractal-fumbler commented 1 year ago

hello :) can webui be used with arc a770 to launch gptq models?

transformers giving me error WARNING:No GPU has been detected by Pytorch. Falling back to CPU mode. after clean install it gave me error AssertionError: Torch not compiled with CUDA enabled

after another clean install now i have this error


raise RuntimeError("GPU is required to quantize or run quantize model.")```

fractal-fumbler commented 1 year ago

https://github.com/intel-analytics/BigDL/tree/main/python/llm

[bigdl-llm](https://bigdl.readthedocs.io/en/latest/doc/LLM/index.html) is a library for running LLM (large language model) on Intel XPU (from Laptop to GPU to Cloud) using INT4 with very low latency[1](https://github.com/intel-analytics/BigDL/tree/main/python/llm#user-content-fn-1-bc5e065cc5f0f26d432e4d76786a6dd7) (for any PyTorch model).

can this be used with webui and intel arc gpus?

djstraylight commented 1 year ago

Seems that Intel has broken the pytorch extension for xpu repo and it's going to a HTTP site instead of https. Here is a workaround for the one_click.py: "python -m pip install --trusted-host ec2-52-27-27-201.us-west-2.compute.amazonaws.com torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu -f 'http://ec2-52-27-27-201.us-west-2.compute.amazonaws.com/ipex-release.php?device=xpu&repo=us&release=stable'"

But seeing other errors related to the PyTorch version: /text-generation-webui/installer_files/env/lib/python3.11/site-packages/accelerate/utils/imports.py:245: UserWarning: Intel Extension for PyTorch 2.0 needs to work with PyTorch 2.0.*, but PyTorch 2.1.0 is found. Please switch to the matching version and run again.

HubKing commented 1 year ago

Hello, does it currently work with Intel Arc (on Arch Linux) without much of a problem? I can run Vladmir's automatic1111 on this computer, so maybe I think this could also run, but I am not sure.

PS: I ran the installer and it exited with the following error:

Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Looking in links: https://developer.intel.com/ipex-whl-stable-xpu
ERROR: Could not find a version that satisfies the requirement torch==2.0.1a0 (from versions: 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1)
ERROR: No matching distribution found for torch==2.0.1a0
Command '. "/home/username/diffusion/text-generation-webui/installer_files/conda/etc/profile.d/conda.sh" && conda activate "/home/username/diffusion/text-generation-webui/installer_files/env" && conda install -y -k ninja git && python -m pip install torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu -f https://developer.intel.com/ipex-whl-stable-xpu && python -m pip install py-cpuinfo==9.0.0' failed with exit status code '1'.

Exiting now.
Try running the start/update script again.

naptastic commented 1 year ago

As of right now (2023-11-27) Intel's instructions to install their pytorch extension do not work. In order to get the three necessary wheel files (torch 2.0.1a0, torchvision 0.15.2a0, intel_extension_for_pytorch 2.0.110+xpu) I had to download them as files from the URL provided, then install them with pip.

This is not enough to get ARC support working. The answer still seems to be "it should work, in theory, but nobody's actually done it yet".

HubKing commented 1 year ago

As of right now (2023-11-27) Intel's instructions to install their pytorch extension do not work.

Isn't a stupid move from Intel? I mean, Intel should have done their best to make their GPU work with the latest A.I stuff and help developers to achieve it, instead of focusing on games. These days, people constantly talk about A.I., not about triple-A 3D games. This kind of constant frustration with the A.I. apps makes me think about switching to NVidia (if they fix the damn Wayland problem).

Any way, please let us know when it works again.

simonlui commented 1 year ago

The packages are there at https://developer.intel.com/ipex-whl-stable-xpu which you can browse, pip just isn't picking them up for whatever reason now with the URL. You need to manually install the packages or directly link the packages that are needed for install. For my Linux install, I had to do the following:

pip install https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/torch-2.0.1a0%2Bcxx11.abi-cp310-cp310-linux_x86_64.whl https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/torchvision-0.15.2a0%2Bcxx11.abi-cp310-cp310-linux_x86_64.whl https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/intel_extension_for_pytorch-2.0.110%2Bxpu-cp310-cp310-linux_x86_64.whl

The package versions needed for install will vary depending on what OS platform and Python version is being used on your machine.

HubKing commented 1 year ago

It says that the environment is externally managed and try pacman -S python-xyz where xyz is the package. In that case, what do I need to do?

Jacoby1218 commented 1 year ago

As of right now, there are 3 possible ways to get this to work with ARC GPUs:

The Intel Extension for PyTorch, which currently doesn't work on Windows.
OpenVINO with PyTorch dev versions (unsure if this actually would work, OpenVINO needs to be supported by the frontend to be used, and OpenVINO supports LLMs, just haven't seen it used before for something like this)
The new Intel Extension for Transformers: the most promising, supports models converted with llama.cpp (though I don't know if it supports ARC GPUs yet, last I checked, support was forthcoming)

simonlui commented 1 year ago

1. The Intel Extension for PyTorch, which currently doesn't work on Windows.

As I posted in https://github.com/oobabooga/text-generation-webui/issues/3761#issuecomment-1771257122, Windows does work with Intel Extension for Pytorch but you need to install a third party package since Intel does not do it at this time. Using the latest Windows drivers now work too. Intel has started in the issue tracker on GitHub they will Windows packaging soon. IPEX is also due for an update soon.

Jacoby1218 commented 12 months ago

I was under the impression there were still driver issues, but if it works now that's great.

ghost commented 11 months ago

I'm not sure if this is the right place to post this. I receive the below error after installing OobaBooga using the default Arc install option on Windows. The install seemed to go well but running it results in the below DLL load error. Other threads that mentioned this loading error suggested it might be a PATH issue. I tried adding a few paths to the OS environment but couldn't resolve it. Any suggestions?

It's an Arc A770 on Windows 10. Intel® Graphics Driver 31.0.101.5081/31.0.101.5122 (WHQL Certified). I also tried rolling back to driver 4676 and doing a clean install with the same results. Some of the paths I added were those listed here. I'm also not seeing any of the DLL's listed at that link in those directories. Instead, I have intel-ext-pt-gpu.dll and intel-ext-pt-python.dll in "%PYTHON_ENV_DIR%\lib\site-packages\intel_extension_for_pytorch\bin" and no DLL's in "%PYTHON_ENV_DIR%\lib\site-packages\torch\lib". backend_with_compiler.dll is there.

Traceback (most recent call last) ─────────────────────────────────────────┐
│ C:\text-generation-webui\server.py:6 in <module>                                                                 │
│                                                                                                                     │
│     5                                                                                                               │
│ >   6 import accelerate  # This early import makes Intel GPUs happy                                                 │
│     7                                                                                                               │
│                                                                                                                     │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\accelerate\__init__.py:3 in <module>              │
│                                                                                                                     │
│    2                                                                                                                │
│ >  3 from .accelerator import Accelerator                                                                           │
│    4 from .big_modeling import (                                                                                    │
│                                                                                                                     │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\accelerate\accelerator.py:32 in <module>          │
│                                                                                                                     │
│     31                                                                                                              │
│ >   32 import torch                                                                                                 │
│     33 import torch.utils.hooks as hooks                                                                            │
│                                                                                                                     │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\torch\__init__.py:139 in <module>                 │
│                                                                                                                     │
│    138                 err.strerror += f' Error loading "{dll}" or one of its dependencies.'                        │
│ >  139                 raise err                                                                                    │
│    140                                                                                                              │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
OSError: [WinError 126] The specified module could not be found. Error loading
"C:\text-generation-webui\installer_files\env\Lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its
dependencies.
Press any key to continue . . .

HubKing commented 11 months ago

I updated the code and run it again (did not do anything else). This time, it passed the previous crash "No matching distribution found for torch==2.0.1a0", but after downloading a lot of stuff, it crashed with the following. If I run the script again, I get the same output as below again.

*******************************************************************
* WARNING: You haven't downloaded any model yet.
* Once the web UI launches, head over to the "Model" tab and download one.
*******************************************************************

╭───────────────────────────────── Traceback (most recent call last) ─────────────────────────────────╮
│ /home/username/diffusion/text-generation-webui/server.py:6 in <module>                                   │
│                                                                                                     │
│     5                                                                                               │
│ ❱   6 import accelerate  # This early import makes Intel GPUs happy                                 │
│     7                                                                                               │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/accelera │
│ te/__init__.py:3 in <module>                                                                        │
│                                                                                                     │
│    2                                                                                                │
│ ❱  3 from .accelerator import Accelerator                                                           │
│    4 from .big_modeling import (                                                                    │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/accelera │
│ te/accelerator.py:32 in <module>                                                                    │
│                                                                                                     │
│     31                                                                                              │
│ ❱   32 import torch                                                                                 │
│     33 import torch.utils.hooks as hooks                                                            │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/torch/__ │
│ init__.py:234 in <module>                                                                           │
│                                                                                                     │
│    233     if USE_GLOBAL_DEPS:                                                                      │
│ ❱  234         _load_global_deps()                                                                  │
│    235     from torch._C import *  # noqa: F403                                                     │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/torch/__ │
│ init__.py:193 in _load_global_deps                                                                  │
│                                                                                                     │
│    192         if not is_cuda_lib_err:                                                              │
│ ❱  193             raise err                                                                        │
│    194         for lib_folder, lib_name in cuda_libs.items():                                       │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/torch/__ │
│ init__.py:174 in _load_global_deps                                                                  │
│                                                                                                     │
│    173     try:                                                                                     │
│ ❱  174         ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)                                       │
│    175     except OSError as err:                                                                   │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/ctypes/__init__.py:376 │
│ in __init__                                                                                         │
│                                                                                                     │
│   375         if handle is None:                                                                    │
│ ❱ 376             self._handle = _dlopen(self._name, mode)                                          │
│   377         else:                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory

Jacoby1218 commented 11 months ago

@HubKing Run "source /opt/intel/oneapi/setvars.sh" and try again. If you don't have it, make sure to install the oneAPI Basekit.

HubKing commented 11 months ago

Thanks. intel-oneapi-basekit 2024.0.0.49564-2 had been already installed and running that command solved the problem. But why did this problem happen in the first place? The user of that package is supposed to add that shell script manually to the environment?

Jacoby1218 commented 10 months ago

Thanks. intel-oneapi-basekit 2024.0.0.49564-2 had been already installed and running that command solved the problem. But why did this problem happen in the first place? The user of that package is supposed to add that shell script manually to the environment?

Yes, exactly that.

djstraylight commented 10 months ago

What models run on an Intel Arc GPU? It seems like .gguf models are running on the CPU.

ghost commented 10 months ago

@djstraylight

For me at least, GGUF's default loading via llama.cpp, which is in the process of implementng Arc support.

djstraylight commented 10 months ago

@djstraylight

For me at least, GGUF's default loading via llama.cpp, which is in the process of implementng Arc support.

Thanks for the info. text-generation-webui seems to be using the latest pytorch extensions from intel but what loaders are actually implementing it. Seems like basic transformers loader but I'll have to dig into the code.

Nuullll commented 10 months ago

https://github.com/oobabooga/text-generation-webui/pull/5191 would fix most of the env issues for IPEX.

idelacio commented 10 months ago

It builds now but on starting I get the attached errors SadArcLogs.txt

The Nvidia version runs just fine. (same version, rebuilt, both builds tested from C drive (logs are D drive build but same errors))

Running Windows Server 2019 Dual card setup- Arc 770 16GB in primary PCIE slot 3060 12GB in secondary

Sawyer73 commented 10 months ago

It builds now but on starting I get the attached errors SadArcLogs.txt

The Nvidia version runs just fine. (same version, rebuilt, both builds tested from C drive (logs are D drive build but same errors))

Running Windows Server 2019 Dual card setup- Arc 770 16GB in primary PCIE slot 3060 12GB in secondary

Same here. Adding 'share' flag does remove localhost error message, but when I try to get through localhost or even a gradio link, it loads a blank screen. Basically links works, but there's nothing on them.

idelacio commented 10 months ago

It now builds and interface loads from main branch version.

Not sure how to run models from the card though, AWQ and GPTQ don't work at all and error out and GGUF just works from the CPU.

kcyarn commented 10 months ago

I'm running an Intel Arc A770 as a non-display GPU on Ubuntu 23.10. (Intel i7-13700k handles the display.) Selecting the Intel GPU option during oobabooga's first run did not load models to the GPU. In case anyone else experiences this problem, here's what worked for me.

This assumes the following in Ubuntu:

Graphics drivers for Intel Arc installed (see Intel for directions).
Intel OneApi installed.
Username is in the renderer group.
Hangcheck timeout disabled.
OneApi is initialized.

Intel suggests several different ways to initialize OneApi. Per their directions, I added the following line to .bashrc and rebooted.

source /opt/intel/oneapi/setvars.sh

This eliminates the error OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory.

The Intel extension for pytorch was correctly installed along with all of the other dependencies. No issues there, but it still wasn't loading anything in the GPU. To fix this, I needed to recompile llama-cpp-python.

I'm leaving the below for now because it did eliminate some errors. However, it's a mirage. It's not actually using the GPU.

cd text-generation-webui
./cmd_linux.sh

pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_NATIVE=ON" pip install llama-cpp-python

For the cmake arguments, I used llama.cpp's Intel OneMKL arguments.

And now loading llama2-7b (gguf) with n-gpu-layers set to its maximum value results in:

llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU

djstraylight commented 10 months ago

And now loading llama2-7b (gguf) with n-gpu-layers set to its maximum value results in:
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU

Did you use intel-gpu-top to verify that it is actually using the GPU?

kcyarn commented 10 months ago

And now loading llama2-7b (gguf) with n-gpu-layers set to its maximum value results in:
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
Did you use intel-gpu-top to verify that it is actually using the GPU?

I'm getting some really odd intel-gpu-top results. It blips when the model loads and then does nothing, leading me to think this is another mirage.

In comparison, in llama.cpp, Blitter hits 80% with 30 layers on the same model. But that's compiled with clblast and needs platform and device environment variables.

djstraylight commented 10 months ago

I found the same thing. Using -DLLAMA_BLAS_VENDOR=Intel10_64lp doesn't actually offload the processing to the Intel GPU.

I compiled with clblast and that actually was using my ARC GPU but the LLM was spitting out gibbish. Still some bug hunting needed.

kcyarn commented 10 months ago

I found the same thing. Using -DLLAMA_BLAS_VENDOR=Intel10_64lp doesn't actually offload the processing to the Intel GPU.

I compiled with clblast and that actually was using my ARC GPU but the LLM was spitting out gibbish. Still some bug hunting needed.

So after spending a few hours experimenting with llama.cpp and llama-cpp-python, I got them both running on the gpu last night. I got oobabooga running on the Intel arc gpu a few minutes ago.

This is using llama-2-7b-chat.Q5_K_M.gguf with llama.cpp and 30 n-gpu-layers.

intel_gpu_top screenshot while running oobabooga

Oobabooga output using the Intel Arc A770 GPU

No gibberish and it corrected the grammar error in the prompt. :)

I'm not sure how user-friendly we'll be able to make running this nor have I stress tested this beyond a few pithy prompts. For reference, I'm using Ubuntu 23.10 (mantic). To compile with clblast, I needed libclblast-dev >= 1.6.1-1 and the most recent stable Intel drivers. I'm happy to dig into the dependencies more, if needed.

(The below assumes you've run ./start_linux.sh for the first time.)

Step 1

Open 2 terminals.

In the first, run

clinfo - l

In the second, run

cd text-generation-webui
./cmd_linux.sh
clinfo -l

Here's the output from my system. As you can see, conda doesn't know a GPU exists.

Ubuntu output:

Platform #0: Intel(R) FPGA Emulation Platform for OpenCL(TM)
 `-- Device #0: Intel(R) FPGA Emulation Device
Platform #1: Intel(R) OpenCL
 `-- Device #0: 13th Gen Intel(R) Core(TM) i7-13700K
Platform #2: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) Arc(TM) A770 Graphics
Platform #3: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) UHD Graphics 770

Inside conda:

Platform #0: Intel(R) OpenCL
 `-- Device #0: 13th Gen Intel(R) Core(TM) i7-13700K

Note, installing olc-icd-system in conda (the semi-official fix) did not work.

Step 2

Conda needs your system's OpenCL vendor .icd files. On Ubuntu, these are at /etc/OpenCL/vendors/.

In terminal, cd into the text-generation-webui directory. (Just the basic terminal, not cmd_linux.sh)

Run

rm -r ./installer_files/env/etc/OpenCL/vendors/
mkdir ./installer_files/env/etc/OpenCL/vendors/
ln -s /etc/OpenCL/vendors/*.icd ./installer_files/env/etc/OpenCL/vendors/

This deletes conda's OpenCL vendors directory, recreates it, and then creates symlinks to Ubuntu's icd files.

.\cmd_linux.sh

Recheck conda's clinfo.

clinfo -l

My output is now:

Platform #0: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) Arc(TM) A770 Graphics
Platform #1: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) UHD Graphics 770
Platform #2: Intel(R) OpenCL
 `-- Device #0: 13th Gen Intel(R) Core(TM) i7-13700K
Platform #3: Intel(R) FPGA Emulation Platform for OpenCL(TM)
 `-- Device #0: Intel(R) FPGA Emulation Device

The platform numbers are different from what they are in Ubuntu, which changes llama.cpp's GGML_OPENCL_PLATFORM environment variable. (For now, just paste the output somewhere. You'll need it in a minute.)

Step 3

Recompile llama-cpp-python in the .\cmd_linux.sh terminal.

pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CLBLAST=ON" FORCE_CMAKE=1 pip install --no-cache-dir llama-cpp-python

Step 4

In terminal (not .\cmd_linux.sh), cd into the text-generation-webui directory if you're not still there.

Go to conda' clinfo -l output and note the platform number for your graphics card and the card name beside it's device. You don't need the full name, just the letters and number.

I'm using this bit:

Platform #0: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) Arc(TM) A770 Graphics

Edit your platform number and device name. Then run the exports in the terminal.

export GGML_OPENCL_PLATFORM=0
export GGML_OPENCL_DEVICE=A770

./start_linux.sh

It worked.

Admittedly, it's not as snappy as running llama2-7b in BigDL on the same GPU, but it's a massive speed improvement over the cpu.

On my system, this only works if I use the exports to tell it what to use. I don't know if you'll need to do that on a system that only has one display option. (I'm using the cpu for display.)

Oobabooga was a fresh download.

kcyarn commented 10 months ago

Draft Guide for Running Ooobabooga on Intel Arc

More eyes and testers are needed before considering submission to the main repository.

Installation Notes

Although editing conda's OpenCL vendor files is a viable option, swapping to a standard python3 install and using a venv resulted in improved performance in tokens/s by approximately 71% across all tested models. It also eliminates possible issues with older conda libraries and bleeding-edge ones needed for Intel Arc. For now, skipping conda and its CDTs appears to be the most reliable option.

Working Model Loaders

llama.cpp
transformers

The latest Intel extension for transformers added INT4 inference support for Arc. Hugging Face transformers committed XPU support for the trainer in September '23. If any of the other model loaders use transformers, they may run with little effort. (They may also require a fairly major fork. In which case, adding a BigDL model loader is probably a better use of energy. That's just my opinion. My BigDL experiments are still in Jupyter notebooks, but it's been a good experience on both the Intel GPU and the CPU.)

Note: Loaders are hardcoded in modules/loaders.py. Without refactoring this to be more modular like extensions or [shudder] monkeypatching, we just need to remember which ones work with our individual system. Making it more modular and customizable for different combinations of CPUs and GPUs is a much broader discussion than getting this working on the Intel Arc. It would also need a lot of buy-in and commitment from the community.

Models Tested

transformers
- llama2-7b-chat-hf
- mistralai_Mistral-7B-Instruct-v0.2
llama.cpp
- llama-2-7b-chat.Q5_K_M.gguf
- mistral-7b-instruct-v0.2.Q5_K_M.gguf

What Isn't Tested

Most models
Training
Parameters
Extensions
Regular use beyond "does it load and run a few simple prompts"

Note: Coqui_tss, silero_tts, whisper_stt, superbooga, and superboogav2 are all breaking installs. It may be possible to install their requirements without any dependencies and then pick up the additional dependencies during debugging. TTS, in particular, upgrades torch to the wrong version for the Intel extension.

Install Notes

Latest Intel Arc drivers installed. See Intel client GPU installation docs.
Intel OneAPI basekit installed
Install opencl-headers ocl-icd libclblast-dev python3 python3-pip python3-venv libgl1 libglib2.0-0 libgomp1 libjemalloc-dev

Note: libclblast-dev >= 1.6
Your username is part of the renderer group.
You have hangcheck disabled in grub.

The last two items are just standard things I do with a fresh install or new graphics card. They may no longer be necessary. If you've already installed these, check for updates. Intel kicked off 2024 with a lot of updates.

Test Machine Details

Ubuntu 23.10
6.5.0.14.16 generic linux
i7-13700k CPU (runs the display)
Intel Arc A770 (non-display)

Bash Scripts

Below are 2 bash scripts: install_arch.sh and run_arch.sh. They need to be saved or symlinked to the text-generation-webui directory.

Getting Started

Download or clone a fresh copy of Oobabooga.
Save the below scripts into text-generation-webui. These should be in the same folder as one_click.py, cmd_linux.sh, etc.

Make them executable.

cd text-generation-webui
./install_arch.sh

Check clinfo for your hardware information.
```
clinfo -l
```
In run_arc.sh, find GGML_OPENCL_PLATFORM and change it to your platform number. Then change the GGML_OPENCL_DEVICE to your device name. Save the file.
Start the server with run_arch.sh. This uses any flags you've saved in CMD_FLAGS.txt. You can also use flags like --listen --api with the script.
```
./run_arch.sh
```
Both the scripts below were uploaded to github. This is just a starting point. Changes welcome. Once it's right in bash, we can decide whether to integrate it with oobabooga's start_linux.sh, requirements files, and one_click.py.

install_arch.sh

#!/bin/bash

# Check if the virtual environment already exists
if [[ ! -d "venv" ]]; then
    # Create the virtual environment
    python -m venv venv
fi

# Activate the virtual environment
source venv/bin/activate

# Intel extension for transformers recently added Arc support.
# See https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/notebooks/build_chatbot_on_xpu.ipynb for additional notes on the dependencies.
# Working model loaders:
#  - llama.cpp
#  - transformers

pip install intel-extension-for-transformers

# Install xpu intel pytorch, not cpu.

pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

# Installing these from requriements_cpu_only.txt causes dependency with intel pytorch.

# Install a few of the dependencies for the below.
pip install coloredlogs datasets sentencepiece

pip install --no-deps peft==0.7.* optimum==1.16.* optimum-intel accelerate==0.25.*

# Skip llama-cpp-python install and all installed above without their deps.

grep -v -e peft -e optimum -e accelerate -e llama-cpp-python requirements_cpu_only.txt > temp_requirements.txt

pip install -r temp_requirements.txt

# Install the cpuinfo dependency installed by one_click
pip install py-cpuinfo==9.0.0

# Use the correct cmake args for llama-cpp

export CMAKE_ARGS="-DLLAMA_CLBLAST=ON"
export FORCE_CMAKE=1

pip install --no-cache-dir llama-cpp-python

# List of extensions to exclude
# Exclude coqui_tss because it causes torch dependency issues with intel gpus.
# Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.
cd extensions

extensions=()  # Create an empty array to store folder names
# List of extensions to exclude
# Exclude coqui_tss because it causes torch dependency issues with intel gpus.
# Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.
exclude_extensions=(coqui_tts silero_tts whisper_stt superbooga superboogav2)

for folder in */; do
    extensions+=($folder)
done

echo "${extensions[*]}"

install_extensions=()

for ext in "${extensions[@]}"; do
    should_exclude=false

    for exclude_ext in "${exclude_extensions[@]}"; do
        if [[ "$ext" == *"$exclude_ext"* ]]; then
            should_exclude=true
            break
        fi
    done

    if [ "$should_exclude" = false ]; then
        install_extensions+=("$ext")
    fi
done

# Print the install_extensions
# echo "${install_extensions[@]}"

for extension in ${install_extensions[@]}; do
    cd "$extension"
    echo -e "\n\n$extension\n\n"
    # Install dependencies from requirements.txt
    if [ -e "requirements.txt" ]; then
        echo "Installing requirements in $dir"
        pip install -r requirements.txt
    else
        echo "No requirements.txt found in $dir"
    fi
    cd ..
done
# Leave the extension directory.
cd ..

# Delete the temp_requirements.txt file.

rm temp_requirements.txt

run_arch.sh

#!/bin/bash
# Uncomment if oneapi is not in your .bashrc
# source /opt/intel/oneapi/setvars.sh
# Activate virtual environment built with install_arc.sh. (Not conda!)
source venv/bin/activate

# Change these values to match your card in clinfo -l
# Needed by llama.cpp

export GGML_OPENCL_PLATFORM=2
export GGML_OPENCL_DEVICE=A770

# Use sudo intel_gpu_top to view your card.

# Capture command-line arguments
flags_from_cmdline=$@

# Read flags from CMD_FLAGS.txt
flags_from_file=$(grep -v '^#' CMD_FLAGS.txt | grep -v '^$')
# Combine flags from both sources
all_flags="$flags_from_file $flags_from_cmdline"

# Run the Python script with the combined flags
python server.py $all_flags

djstraylight commented 10 months ago

@kcyarn Great work on getting XPU/OpenCL more integrated with text-generation-webui!

thejacer commented 10 months ago

Draft Guide for Running Ooobabooga on Intel Arc

More eyes and testers are needed before considering submission to the main repository.

Tried this in WSL running Ubuntu 22.04, here are some notes:

libclblast-dev >= 1.6 - this package is only available via default repos in 1.6+ on Ubuntu 23.10 (might be available on other flavors idk)
I was able to go grab and install the 1.6 .deb from the repos plus a libclblast1 package listed as dependency from the repos and install them.
After following your instructions on a new Ubuntu "python -m venv venv" wouldn't work, I had to change it to "python3 -m venv venv"
Despite no errors other than what I've outlined here I still get 0 platforms for clinfo
Again despite no errors other than what's above I get "OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory" when I run run_arch.sh

Sorry if this isn't helpful, I've never run WSL before so I'm not sure what the limitations are.

kcyarn commented 10 months ago

It sounds like either the GPU isn't passing through to WSL2 or there's a missing dependency.

Which version of Ubuntu are you using on WSL2? I'm the using the most recent release, not the LTS, because the newer kernels work better with this card. You may want to try upgrading the release.

Have you tried this Intel guide to get the card running in WSL2? https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html

It'll be a few days before I can run any WSL2 tests.

From: thejacer @.> Sent: Sunday, January 28, 2024 5:30:34 AM To: oobabooga/text-generation-webui @.> Cc: Kristle Chester @.>; Mention @.> Subject: Re: [oobabooga/text-generation-webui] Intel Arc thread (Issue #3761)

Draft Guide for Running Ooobabooga on Intel Arc

More eyes and testers are needed before considering submission to the main repository.

Installation Notes

Although editing conda's OpenCL vendor files is a viable option, swapping to a standard python3 install and using a venv resulted in improved performance in tokens/s by approximately 71% across all tested models. It also eliminates possible issues with older conda libraries and bleeding-edge ones needed for Intel Arc. For now, skipping conda and its CDTs appears to be the most reliable option.

Working Model Loaders

llama.cpp
transformers

The latest Intel extension for transformers added INT4 inference support for Arc. Hugging Face transformers committed XPU support for the trainer in September '23. If any of the other model loaders use transformers, they may run with little effort. (They may also require a fairly major fork. In which case, adding a BigDLhttps://github.com/intel-analytics/BigDL model loader is probably a better use of energy. That's just my opinion. My BigDL experiments are still in Jupyter notebooks, but it's been a good experience on both the Intel GPU and the CPU.)

Note: Loaders are hardcoded in modules/loaders.py. Without refactoring this to be more modular like extensions or [shudder] monkeypatching, we just need to remember which ones work with our individual system. Making it more modular and customizable for different combinations of CPUs and GPUs is a much broader discussion than getting this working on the Intel Arc. It would also need a lot of buy-in and commitment from the community.

Models Tested

transformers
- llama2-7b-chat-hf
- mistralai_Mistral-7B-Instruct-v0.2
llama.cpp
- llama-2-7b-chat.Q5_K_M.gguf
- mistral-7b-instruct-v0.2.Q5_K_M.gguf

What Isn't Tested

Most models
Training
Parameters
Extensions
Regular use beyond "does it load and run a few simple prompts"

Note: Coqui_tss, silero_tts, whisper_stt, superbooga, and superboogav2 are all breaking installs. It may be possible to install their requirements without any dependencies and then pick up the additional dependencies during debugging. TTS, in particular, upgrades torch to the wrong version for the Intel extension.

Install Notes

Latest Intel Arc drivers installed. See Intel client GPU installation docs.https://dgpu-docs.intel.com/driver/client/overview.html
Intel OneAPI basekit installedhttps://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html
Install opencl-headers ocl-icd libclblast-dev python3 python3-pip python3-venv libgl1 libglib2.0-0 libgomp1 libjemalloc-dev Note: libclblast-dev >= 1.6
Your username is part of the renderer group.
You have hangcheck disabled in grub.

The last two items are just standard things I do with a fresh install or new graphics card. They may no longer be necessary. If you've already installed these, check for updates. Intel kicked off 2024 with a lot of updates.

Test Machine Details

Ubuntu 23.10
6.5.0.14.16 generic linux
i7-13700k CPU (runs the display)
Intel Arc A770 (non-display)

Bash Scripts

Below are 2 bash scripts: install_arch.sh and run_arch.sh. They need to be saved or symlinked to the text-generation-webui directory.

Getting Started

Download or clone a fresh copy of Oobabooga.
Save the below scripts into text-generation-webui. These should be in the same folder as one_click.py, cmd_linux.sh, etc.
Make them executable.

cd text-generation-webui ./install_arch.sh

Check clinfo for your hardware information.

clinfo -l

In run_arc.sh, find GGML_OPENCL_PLATFORM and change it to your platform number. Then change the GGML_OPENCL_DEVICE to your device name. Save the file.
Start the server with run_arch.sh. This uses any flags you've saved in CMD_FLAGS.txt. You can also use flags like --listen --api with the script.

./run_arch.sh

Both the scripts below were uploaded to githubhttps://github.com/kcyarn/oobabooga_intel_arc. This is just a starting point. Changes welcome. Once it's right in bash, we can decide whether to integrate it with oobabooga's start_linux.sh, requirements files, and one_click.py.

install_arch.sh

!/bin/bash

Check if the virtual environment already exists

if [[ ! -d "venv" ]]; then

Create the virtual environment

python -m venv venv

fi

Activate the virtual environment

source venv/bin/activate

Intel extension for transformers recently added Arc support.

See https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/notebooks/build_chatbot_on_xpu.ipynb for additional notes on the dependencies.

Working model loaders:

- llama.cpp

- transformers

pip install intel-extension-for-transformers

Install xpu intel pytorch, not cpu.

pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

Installing these from requriements_cpu_only.txt causes dependency with intel pytorch.

Install a few of the dependencies for the below.

pip install coloredlogs datasets sentencepiece

pip install --no-deps peft==0.7. optimum==1.16. optimum-intel accelerate==0.25.*

Skip llama-cpp-python install and all installed above without their deps.

grep -v -e peft -e optimum -e accelerate -e llama-cpp-python requirements_cpu_only.txt > temp_requirements.txt

pip install -r temp_requirements.txt

Install the cpuinfo dependency installed by one_click

pip install py-cpuinfo==9.0.0

Use the correct cmake args for llama-cpp

export CMAKE_ARGS="-DLLAMA_CLBLAST=ON" export FORCE_CMAKE=1

pip install --no-cache-dir llama-cpp-python

List of extensions to exclude

Exclude coqui_tss because it causes torch dependency issues with intel gpus.

Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.

cd extensions

extensions=() # Create an empty array to store folder names

List of extensions to exclude

Exclude coqui_tss because it causes torch dependency issues with intel gpus.

Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.

exclude_extensions=(coqui_tts silero_tts whisper_stt superbooga superboogav2)

for folder in */; do extensions+=($folder) done

echo "${extensions[*]}"

install_extensions=()

for ext in "${extensions[@]}"; do should_exclude=false

for exclude_ext in "${exclude_extensions[@]}"; do
    if [[ "$ext" == *"$exclude_ext"* ]]; then
        should_exclude=true
        break
    fi
done

if [ "$should_exclude" = false ]; then
    install_extensions+=("$ext")
fi

done

Print the install_extensions

echo "${install_extensions[@]}"

for extension in ${install_extensions[@]}; do cd "$extension" echo -e "\n\n$extension\n\n"

Install dependencies from requirements.txt

if [ -e "requirements.txt" ]; then
    echo "Installing requirements in $dir"
    pip install -r requirements.txt
else
    echo "No requirements.txt found in $dir"
fi
cd ..

done

Leave the extension directory.

cd ..

Delete the temp_requirements.txt file.

rm temp_requirements.txt

run_arch.sh

!/bin/bash

Uncomment if oneapi is not in your .bashrc

source /opt/intel/oneapi/setvars.sh

Activate virtual environment built with install_arc.sh. (Not conda!)

source venv/bin/activate

Change these values to match your card in clinfo -l

Needed by llama.cpp

export GGML_OPENCL_PLATFORM=2 export GGML_OPENCL_DEVICE=A770

Use sudo intel_gpu_top to view your card.

Capture command-line arguments

flags_from_cmdline=$@

Read flags from CMD_FLAGS.txt

flags_from_file=$(grep -v '^#' CMD_FLAGS.txt | grep -v '^$')

Combine flags from both sources

all_flags="$flags_from_file $flags_from_cmdline"

Run the Python script with the combined flags

python server.py $all_flags

Tried this in WSL running Ubuntu 22.04, here are some notes:

libclblast-dev >= 1.6 - this package is only available via default repos in 1.6+ on Ubuntu 23.10 (might be available on other flavors idk)
I was able to go grab and install the 1.6 .deb from the repos plus a libclblast1 package listed as dependency from the repos and install them.
After following your instructions on a new Ubuntu "python -m venv venv" wouldn't work, I had to change it to "python3 -m venv venv"
Despite no errors other than what I've outlined here I still get 0 platforms for clinfo
Again despite no errors other than what's above I get "OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory" when I run run_arch.sh

Sorry if this isn't helpful, I've never run WSL before so I'm not sure what the limitations are.

— Reply to this email directly, view it on GitHubhttps://github.com/oobabooga/text-generation-webui/issues/3761#issuecomment-1913548117, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACVH5H5R22HQ7FL5AKTOWLYQYSEVAVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGU2DQMJRG4. You are receiving this because you were mentioned.Message ID: @.***>

thejacer commented 10 months ago

It sounds like either the GPU isn't passing through to WSL2 or there's a missing dependency. Which version of Ubuntu are you using on WSL2? I'm the using the most recent release, not the LTS, because the newer kernels work better with this card. You may want to try upgrading the release. Have you tried this Intel guide to get the card running in WSL2? https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html It'll be a few days before I can run any WSL2 tests. … ____ From: thejacer @.> Sent: Sunday, January 28, 2024 5:30:34 AM To: oobabooga/text-generation-webui @.> Cc: Kristle Chester @.>; Mention @.> Subject: Re: [oobabooga/text-generation-webui] Intel Arc thread (Issue #3761) Draft Guide for Running Ooobabooga on Intel Arc More eyes and testers are needed before considering submission to the main repository. Installation Notes Although editing conda's OpenCL vendor files is a viable option, swapping to a standard python3 install and using a venv resulted in improved performance in tokens/s by approximately 71% across all tested models. It also eliminates possible issues with older conda libraries and bleeding-edge ones needed for Intel Arc. For now, skipping conda and its CDTs appears to be the most reliable option. Working Model Loaders llama.cpp transformers The latest Intel extension for transformers added INT4 inference support for Arc. Hugging Face transformers committed XPU support for the trainer in September '23. If any of the other model loaders use transformers, they may run with little effort. (They may also require a fairly major fork. In which case, adding a BigDLhttps://github.com/intel-analytics/BigDL model loader is probably a better use of energy. That's just my opinion. My BigDL experiments are still in Jupyter notebooks, but it's been a good experience on both the Intel GPU and the CPU.) Note: Loaders are hardcoded in modules/loaders.py. Without refactoring this to be more modular like extensions or [shudder] monkeypatching, we just need to remember which ones work with our individual system. Making it more modular and customizable for different combinations of CPUs and GPUs is a much broader discussion than getting this working on the Intel Arc. It would also need a lot of buy-in and commitment from the community. Models Tested transformers llama2-7b-chat-hf mistralai_Mistral-7B-Instruct-v0.2 llama.cpp llama-2-7b-chat.Q5_K_M.gguf mistral-7b-instruct-v0.2.Q5_K_M.gguf What Isn't Tested Most models Training Parameters Extensions Regular use beyond "does it load and run a few simple prompts" Note: Coqui_tss, silero_tts, whisper_stt, superbooga, and superboogav2 are all breaking installs. It may be possible to install their requirements without any dependencies and then pick up the additional dependencies during debugging. TTS, in particular, upgrades torch to the wrong version for the Intel extension. Install Notes Latest Intel Arc drivers installed. See Intel client GPU installation docs.https://dgpu-docs.intel.com/driver/client/overview.html Intel OneAPI basekit installedhttps://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html Install opencl-headers ocl-icd libclblast-dev python3 python3-pip python3-venv libgl1 libglib2.0-0 libgomp1 libjemalloc-dev Note: libclblast-dev >= 1.6 Your username is part of the renderer group. You have hangcheck disabled in grub. The last two items are just standard things I do with a fresh install or new graphics card. They may no longer be necessary. If you've already installed these, check for updates. Intel kicked off 2024 with a lot of updates. Test Machine Details Ubuntu 23.10 6.5.0.14.16 generic linux i7-13700k CPU (runs the display) Intel Arc A770 (non-display) Bash Scripts Below are 2 bash scripts: install_arch.sh and run_arch.sh. They need to be saved or symlinked to the text-generation-webui directory. Getting Started 1. Download or clone a fresh copy of Oobabooga. 2. Save the below scripts into text-generation-webui. These should be in the same folder as one_click.py, cmd_linux.sh, etc. 3. Make them executable. cd text-generation-webui ./install_arch.sh 4. Check clinfo for your hardware information. clinfo -l 5. In run_arc.sh, find GGML_OPENCL_PLATFORM and change it to your platform number. Then change the GGML_OPENCL_DEVICE to your device name. Save the file. 6. Start the server with run_arch.sh. This uses any flags you've saved in CMD_FLAGS.txt. You can also use flags like --listen --api with the script. ./run_arch.sh Both the scripts below were uploaded to githubhttps://github.com/kcyarn/oobabooga_intel_arc. This is just a starting point. Changes welcome. Once it's right in bash, we can decide whether to integrate it with oobabooga's start_linux.sh, requirements files, and one_click.py. install_arch.sh #!/bin/bash # Check if the virtual environment already exists if [[ ! -d "venv" ]]; then # Create the virtual environment python -m venv venv fi # Activate the virtual environment source venv/bin/activate # Intel extension for transformers recently added Arc support. # See https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/notebooks/build_chatbot_on_xpu.ipynb for additional notes on the dependencies. # Working model loaders: # - llama.cpp # - transformers pip install intel-extension-for-transformers # Install xpu intel pytorch, not cpu. pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ # Installing these from requriements_cpu_only.txt causes dependency with intel pytorch. # Install a few of the dependencies for the below. pip install coloredlogs datasets sentencepiece pip install --no-deps peft==0.7. optimum==1.16. optimum-intel accelerate==0.25. # Skip llama-cpp-python install and all installed above without their deps. grep -v -e peft -e optimum -e accelerate -e llama-cpp-python requirements_cpu_only.txt > temp_requirements.txt pip install -r temp_requirements.txt # Install the cpuinfo dependency installed by one_click pip install py-cpuinfo==9.0.0 # Use the correct cmake args for llama-cpp export CMAKE_ARGS="-DLLAMA_CLBLAST=ON" export FORCE_CMAKE=1 pip install --no-cache-dir llama-cpp-python # List of extensions to exclude # Exclude coqui_tss because it causes torch dependency issues with intel gpus. # Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation. cd extensions extensions=() # Create an empty array to store folder names # List of extensions to exclude # Exclude coqui_tss because it causes torch dependency issues with intel gpus. # Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation. exclude_extensions=(coqui_tts silero_tts whisper_stt superbooga superboogav2) for folder in /; do extensions+=($folder) done echo "${extensions[]}" install_extensions=() for ext in "${extensions[@]}"; do should_exclude=false for exclude_ext in "${exclude_extensions[@]}"; do if [[ "$ext" == "$exclude_ext"* ]]; then should_exclude=true break fi done if [ "$should_exclude" = false ]; then install_extensions+=("$ext") fi done # Print the install_extensions # echo "${install_extensions[@]}" for extension in ${install_extensions[@]}; do cd "$extension" echo -e "\n\n$extension\n\n" # Install dependencies from requirements.txt if [ -e "requirements.txt" ]; then echo "Installing requirements in $dir" pip install -r requirements.txt else echo "No requirements.txt found in $dir" fi cd .. done # Leave the extension directory. cd .. # Delete the temp_requirements.txt file. rm temp_requirements.txt run_arch.sh #!/bin/bash # Uncomment if oneapi is not in your .bashrc # source /opt/intel/oneapi/setvars.sh # Activate virtual environment built with install_arc.sh. (Not conda!) source venv/bin/activate # Change these values to match your card in clinfo -l # Needed by llama.cpp export GGML_OPENCL_PLATFORM=2 export GGML_OPENCL_DEVICE=A770 # Use sudo intel_gpu_top to view your card. # Capture command-line arguments flags_from_cmdline=$@ # Read flags from CMD_FLAGS.txt flags_from_file=$(grep -v '^#' CMD_FLAGS.txt | grep -v '^$') # Combine flags from both sources all_flags="$flags_from_file $flags_from_cmdline" # Run the Python script with the combined flags python server.py $all_flags Tried this in WSL running Ubuntu 22.04, here are some notes: 1. libclblast-dev >= 1.6 - this package is only available via default repos in 1.6+ on Ubuntu 23.10 (might be available on other flavors idk) 2. I was able to go grab and install the 1.6 .deb from the repos plus a libclblast1 package listed as dependency from the repos and install them. 3. After following your instructions on a new Ubuntu "python -m venv venv" wouldn't work, I had to change it to "python3 -m venv venv" 4. Despite no errors other than what I've outlined here I still get 0 platforms for clinfo 5. Again despite no errors other than what's above I get "OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory" when I run run_arch.sh Sorry if this isn't helpful, I've never run WSL before so I'm not sure what the limitations are. — Reply to this email directly, view it on GitHub<#3761 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACVH5H5R22HQ7FL5AKTOWLYQYSEVAVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGU2DQMJRG4. You are receiving this because you were mentioned.Message ID: @.***>

I'm sorry work required I make a short (no) notice trip out of town and I can't experiment remotely cause it might shut down my system. I'll be back in town in a day or two and able to start working on it again. Regarding the WSL version, I was using WSL 1 just because the WSL instructions for oobabooga said to use WSL1 for Windows 10 and WSL2 for Windows 11.

thejacer commented 9 months ago

I've ditched my old WSL and restarted with Ubuntu 23.10 using WSL2 however: clinfo -l

Platform #0: Intel(R) OpenCL Graphics -- Device #0: Intel(R) Graphics [0x56a0]

clpeak indicates 512 compute units etc.

but Oobabooga fails to find my device.

EDIT: activated the venv and from within it was able to run clinfo -l with the same results as above and clpeak also sees gpu with 512 compute units as well. I honestly don't understand because intel_gpu_top also says there's no gpu installed.

kcyarn commented 9 months ago

Please double check that you've got all the drivers installed in WSL2. See https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2024-0/configure-wsl-2-for-gpu-workflows.html

Then run clinfo and change the script to use your numbers.

I rarely use Windows 11, but I do have it installed. Windows 10 is in a virtual machine. If I have time, I'll see if I can get it running.

Get Outlook for Androidhttps://aka.ms/AAb9ysg

From: thejacer @.> Sent: Sunday, February 4, 2024 6:12:34 AM To: oobabooga/text-generation-webui @.> Cc: Kristle Chester @.>; Mention @.> Subject: Re: [oobabooga/text-generation-webui] Intel Arc thread (Issue #3761)

I've ditched my old WSL and restarted with Ubuntu 23.10 using WSL2 however: clinfo -l

Platform #0: Intel(R) OpenCL Graphics -- Device #0: Intel(R) Graphics [0x56a0]

clpeak indicates 512 compute units etc.

but Oobabooga fails to find my device.

— Reply to this email directly, view it on GitHubhttps://github.com/oobabooga/text-generation-webui/issues/3761#issuecomment-1925705416, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACVH5BXA6XPOQQP5ABPVDDYR5UKFAVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVG4YDKNBRGY. You are receiving this because you were mentioned.Message ID: @.***>

kcyarn commented 9 months ago

Just a thought. Run clinfo without -l and check the entire output for the graphics card. That's probably the easier than double checking the entire install.

Get Outlook for Androidhttps://aka.ms/AAb9ysg

From: thejacer @.> Sent: Sunday, February 4, 2024 6:12:34 AM To: oobabooga/text-generation-webui @.> Cc: Kristle Chester @.>; Mention @.> Subject: Re: [oobabooga/text-generation-webui] Intel Arc thread (Issue #3761)

I've ditched my old WSL and restarted with Ubuntu 23.10 using WSL2 however: clinfo -l

Platform #0: Intel(R) OpenCL Graphics -- Device #0: Intel(R) Graphics [0x56a0]

clpeak indicates 512 compute units etc.

but Oobabooga fails to find my device.

— Reply to this email directly, view it on GitHubhttps://github.com/oobabooga/text-generation-webui/issues/3761#issuecomment-1925705416, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACVH5BXA6XPOQQP5ABPVDDYR5UKFAVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVG4YDKNBRGY. You are receiving this because you were mentioned.Message ID: @.***>

thejacer commented 9 months ago

Platform Name Intel(R) OpenCL Graphics Number of devices 1 Device Name Intel(R) Graphics [0x56a0] Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 3.0 NEO Device UUID 8680a056-0800-0000-0300-000000000000 Driver UUID 32332e33-352e-3237-3139-312e34320000 Valid Device LUID No Device LUID 4005-9721ff7f0000 Device Node Mask 0 Device Numeric Version 0xc00000 (3.0.0) Driver Version 23.35.27191.42 Device OpenCL C Version OpenCL C 1.2 Device OpenCL C all versions OpenCL C 0x400000 (1.0.0) OpenCL C 0x401000 (1.1.0) OpenCL C 0x402000 (1.2.0) OpenCL C 0xc00000 (3.0.0) Device OpenCL C features opencl_c_int64 0xc00000 (3.0.0) opencl_c_3d_image_writes 0xc00000 (3.0.0) opencl_c_images 0xc00000 (3.0.0) opencl_c_read_write_images 0xc00000 (3.0.0) opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0) __opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0) opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0) opencl_c_atomic_scope_device 0xc00000 (3.0.0) opencl_c_generic_address_space 0xc00000 (3.0.0) opencl_c_program_scope_global_variables 0xc00000 (3.0.0) __opencl_c_work_group_collective_functions 0xc00000 (3.0.0) opencl_c_subgroups 0xc00000 (3.0.0) opencl_c_ext_fp32_global_atomic_add 0xc00000 (3.0.0) opencl_c_ext_fp32_local_atomic_add 0xc00000 (3.0.0) opencl_c_ext_fp32_global_atomic_min_max 0xc00000 (3.0.0) __opencl_c_ext_fp32_local_atomic_min_max 0xc00000 (3.0.0) opencl_c_ext_fp16_global_atomic_load_store 0xc00000 (3.0.0) opencl_c_ext_fp16_local_atomic_load_store 0xc00000 (3.0.0) __opencl_c_ext_fp16_global_atomic_min_max 0xc00000 (3.0.0) opencl_c_ext_fp16_local_atomic_min_max 0xc00000 (3.0.0) opencl_c_integer_dot_product_input_4x8bit 0xc00000 (3.0.0) opencl_c_integer_dot_product_input_4x8bit_packed 0xc00000 (3.0.0) Latest conformance test passed v2023-05-16-00 Device Type GPU Device PCI bus info (KHR) PCI-E, 0000:03:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 512 Max clock frequency 2400MHz Device IP (Intel) 0x30dc008 (12.220.8) Device ID (Intel) 22176 Slices (Intel) 8 Sub-slices per slice (Intel) 8 EUs per sub-slice (Intel) 8 Threads per EU (Intel) 8 Feature capabilities (Intel) DP4A, DPAS Device Partition (core) Max number of sub-devices 0 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 1024 Preferred work group size multiple (device) 64 Preferred work group size multiple (kernel) 64 Max sub-groups per work group 128 Sub-group sizes (Intel) 8, 16, 32 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 1 / 1 half 8 / 8 (cl_khr_fp16) float 1 / 1 double 0 / 0 (n/a) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (n/a) Address bits 64, Little-Endian External memory handle types DMA buffer Global memory size 16704737280 (15.56GiB)

clinfo definitely sees my GPU and has it correctly at 16GB vram.

Change these values to match your card in clinfo -l Needed by llama.cpp

export GGML_OPENCL_PLATFORM=0 export GGML_OPENCL_DEVICE="Intel(R) Graphics [0x56a0]"

Use sudo intel_gpu_top to view your card.

That is the current set up in run_arch.sh but intel_gpu_top is still not finding my gpu.

thejacer@DESKTOP-9DLUMOO:~/text-generation-webui$ glxinfo | grep OpenGL DRI3 not available failed to load driver: zink OpenGL vendor string: Mesa OpenGL renderer string: llvmpipe (LLVM 15.0.7, 256 bits) OpenGL core profile version string: 4.5 (Core Profile) Mesa 24.0.0-devel (git-3ca1f35cbf) OpenGL core profile shading language version string: 4.50 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 4.5 (Compatibility Profile) Mesa 24.0.0-devel (git-3ca1f35cbf) OpenGL shading language version string: 4.50 OpenGL context flags: (none) OpenGL profile mask: compatibility profile OpenGL extensions: OpenGL ES profile version string: OpenGL ES 3.2 Mesa 24.0.0-devel (git-3ca1f35cbf) OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20 OpenGL ES profile extensions:

I've seen some comments online that the OpenGL renderer string shouldn't be llvm if I'm using gpu but I haven't figured out how to change that yet.

kcyarn commented 9 months ago

That's good news.

Have you edited run_arc.sh to use your opencl values?

Instead of


export GGML_OPENCL_PLATFORM=2
export GGML_OPENCL_DEVICE=A770

It needs to be something like


export GGML_OPENCL_PLATFORM=0
export GGML_OPENCL_DEVICE=Intel(R)

From: thejacer @.> Sent: Sunday, February 4, 2024 1:26:05 PM To: oobabooga/text-generation-webui @.> Cc: Kristle Chester @.>; Mention @.> Subject: Re: [oobabooga/text-generation-webui] Intel Arc thread (Issue #3761)

Platform Name Intel(R) OpenCL Graphics Number of devices 1 Device Name Intel(R) Graphics [0x56a0] Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 3.0 NEO Device UUID 8680a056-0800-0000-0300-000000000000 Driver UUID 32332e33-352e-3237-3139-312e34320000 Valid Device LUID No Device LUID 4005-9721ff7f0000 Device Node Mask 0 Device Numeric Version 0xc00000 (3.0.0) Driver Version 23.35.27191.42 Device OpenCL C Version OpenCL C 1.2 Device OpenCL C all versions OpenCL C 0x400000 (1.0.0) OpenCL C 0x401000 (1.1.0) OpenCL C 0x402000 (1.2.0) OpenCL C 0xc00000 (3.0.0) Device OpenCL C features opencl_c_int64 0xc00000 (3.0.0) opencl_c_3d_image_writes 0xc00000 (3.0.0) opencl_c_images 0xc00000 (3.0.0) opencl_c_read_write_images 0xc00000 (3.0.0) opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0) __opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0) opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0) opencl_c_atomic_scope_device 0xc00000 (3.0.0) opencl_c_generic_address_space 0xc00000 (3.0.0) opencl_c_program_scope_global_variables 0xc00000 (3.0.0) __opencl_c_work_group_collective_functions 0xc00000 (3.0.0) opencl_c_subgroups 0xc00000 (3.0.0) opencl_c_ext_fp32_global_atomic_add 0xc00000 (3.0.0) opencl_c_ext_fp32_local_atomic_add 0xc00000 (3.0.0) opencl_c_ext_fp32_global_atomic_min_max 0xc00000 (3.0.0) __opencl_c_ext_fp32_local_atomic_min_max 0xc00000 (3.0.0) opencl_c_ext_fp16_global_atomic_load_store 0xc00000 (3.0.0) opencl_c_ext_fp16_local_atomic_load_store 0xc00000 (3.0.0) __opencl_c_ext_fp16_global_atomic_min_max 0xc00000 (3.0.0) opencl_c_ext_fp16_local_atomic_min_max 0xc00000 (3.0.0) opencl_c_integer_dot_product_input_4x8bit 0xc00000 (3.0.0) opencl_c_integer_dot_product_input_4x8bit_packed 0xc00000 (3.0.0) Latest conformance test passed v2023-05-16-00 Device Type GPU Device PCI bus info (KHR) PCI-E, 0000:03:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 512 Max clock frequency 2400MHz Device IP (Intel) 0x30dc008 (12.220.8) Device ID (Intel) 22176 Slices (Intel) 8 Sub-slices per slice (Intel) 8 EUs per sub-slice (Intel) 8 Threads per EU (Intel) 8 Feature capabilities (Intel) DP4A, DPAS Device Partition (core) Max number of sub-devices 0 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 1024 Preferred work group size multiple (device) 64 Preferred work group size multiple (kernel) 64 Max sub-groups per work group 128 Sub-group sizes (Intel) 8, 16, 32 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 1 / 1 half 8 / 8 (cl_khr_fp16) float 1 / 1 double 0 / 0 (n/a) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (n/a) Address bits 64, Little-Endian External memory handle types DMA buffer Global memory size 16704737280 (15.56GiB)

clinfo definitely sees my GPU and has it correctly at 16GB vram.

— Reply to this email directly, view it on GitHubhttps://github.com/oobabooga/text-generation-webui/issues/3761#issuecomment-1925873304, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACVH5EKJJFUXGCUEMUVUVTYR7HD3AVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVHA3TGMZQGQ. You are receiving this because you were mentioned.Message ID: @.***>

oobabooga / text-generation-webui

Intel Arc thread #3761

Step 1

Step 2

Step 3

Step 4

Draft Guide for Running Ooobabooga on Intel Arc

Installation Notes

Working Model Loaders

Models Tested

What Isn't Tested

Install Notes

Test Machine Details

Bash Scripts

Getting Started

install_arch.sh

run_arch.sh

Draft Guide for Running Ooobabooga on Intel Arc

!/bin/bash

Check if the virtual environment already exists

Create the virtual environment

Activate the virtual environment

Intel extension for transformers recently added Arc support.

See https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/notebooks/build_chatbot_on_xpu.ipynb for additional notes on the dependencies.

Working model loaders:

- llama.cpp

- transformers

Install xpu intel pytorch, not cpu.

Installing these from requriements_cpu_only.txt causes dependency with intel pytorch.

Install a few of the dependencies for the below.

Skip llama-cpp-python install and all installed above without their deps.

Install the cpuinfo dependency installed by one_click

Use the correct cmake args for llama-cpp

List of extensions to exclude

Exclude coqui_tss because it causes torch dependency issues with intel gpus.

Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.

List of extensions to exclude

Exclude coqui_tss because it causes torch dependency issues with intel gpus.

Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.

Print the install_extensions

echo "${install_extensions[@]}"

Install dependencies from requirements.txt

Leave the extension directory.

Delete the temp_requirements.txt file.

!/bin/bash

Uncomment if oneapi is not in your .bashrc

source /opt/intel/oneapi/setvars.sh

Activate virtual environment built with install_arc.sh. (Not conda!)

Change these values to match your card in clinfo -l

Needed by llama.cpp

Use sudo intel_gpu_top to view your card.

Capture command-line arguments

Read flags from CMD_FLAGS.txt

Combine flags from both sources

Run the Python script with the combined flags