Open tsantra opened 1 year ago
Yeah, I have reproduced this error. I found on your machine, inference works fine, but once finetuing, it will seg fault at rope. Still no idea about the root cause. By the way, below is our machine's linux version and driver version:
Not sure whether this error is caused by above version mismatch.
@qiuxin2012 @yangw1234 any suggestions ?
any suggestion on what to do next? Thank you!
We have verified locally that 6.2.0-35-generic + Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26690]
can run QLoRA on our local A770 machine. So it seems not version issue and I have checked the python package version is all the same.
But I found that on your mechine, the output of sudo xpu-smi stats -d 0
's output is a little strange:
On our, it is:
Not sure whether this issue is caused by some installation error or anything else.
Below steps are how we setup our arc env on ubuntu 22.04.3, maybe you can refer to this.
Commands on ubuntu 22.04.3:
```bash
# install arc driver
sudo apt-get install -y gpg-agent wget
wget -qO - https://repositories.intel.com/graphics/intel-graphics.key | \
sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg
echo 'deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/graphics/ubuntu jammy arc' | \
sudo tee /etc/apt/sources.list.d/intel.gpu.jammy.list
# downgrade kernel
sudo apt-get update && sudo apt-get install -y --install-suggests linux-image-5.19.0-41-generic
sudo sed -i "s/GRUB_DEFAULT=.*/GRUB_DEFAULT=\"1> $(echo $(($(awk -F\' '/menuentry / {print $2}' /boot/grub/grub.cfg \
| grep -no '5.19.0-41' | sed 's/:/\n/g' | head -n 1)-2)))\"/" /etc/default/grub
sudo update-grub
sudo reboot
# As 5.19's kernel doesn't has any arc graphic driver. The machine may not start the desktop correctly, but we can use the ssh to login.
# Or you can select 5.19's recovery mode in the grub, then choose resume to resume the normal boot directly.
# remove latest kernel
sudo apt purge linux-image-6.2.0-*
sudo apt autoremove
sudo reboot
# install drivers
sudo apt-get update
sudo apt-get -y install \
gawk \
dkms \
linux-headers-$(uname -r) \
libc6-dev
sudo apt-get install -y intel-platform-vsec-dkms intel-platform-cse-dkms intel-i915-dkms intel-fw-gpu
sudo apt-get install -y gawk libc6-dev udev\
intel-opencl-icd intel-level-zero-gpu level-zero \
intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 \
libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo
sudo reboot
# Configuring permissions
sudo gpasswd -a ${USER} render
newgrp render
# Verify the device is working with i915 driver
sudo apt-get install -y hwinfo
hwinfo --display
# install one api
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update
sudo apt install intel-basekit
Model: llama-2-7b-hf Ubuntu :22.04
xpu-smi discovery:
uname -r
Steps followed:
sycl-ls info:
lscpu