joeknock90 / Single-GPU-Passthrough

1.49k stars 75 forks source link

libvirt hangs "Input/output error" #56

Closed apprehensions closed 3 years ago

apprehensions commented 3 years ago

i've used the same script from my previous system, nothing has changed, i can execute the script alone but qemu cannot? i might live with but i would like it if qemu does it for me, it just completely freezes and i cannot restart or do anything to it. permissions are applied, i think this might be an issue with kvm or i'm just stupid

start script:

set -x

systemctl stop getty@tty1

echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind

echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind
sleep 2

modprobe -r nvidia_drm
modprobe -r nvidia_modeset
modprobe -r drm_kms_helper
modprobe -r drm
modprobe -r nvidia_uvm
modprobe -r nvidia

virsh nodedev-detach pci_0000_01_00_0
virsh nodedev-detach pci_0000_01_00_1 

modprobe vfio  
modprobe vfio-pci
modprobe vfio_iommu_type1

stop script:

set -x

modprobe -r vfio_iommu_type1
modprobe -r vfio-pci
modprobe -r vfio

virsh nodedev-reattach pci_0000_01_00_1
virsh nodedev-reattach pci_0000_01_00_0

echo 1 > /sys/class/vtconsole/vtcon0/bind
echo "efi-framebuffer.0" > /sys/bus/platform/drivers/efi-framebuffer/bind
modprobe nvidia
modprobe nvidia_uvm
modprobe drm
modprobe drm_kms_helper
modprobe nvidia_modeset
modprobe nvidia_drm

nvidia-xconfig --query-gpu-info > /dev/null 2>&1
sudo systemctl restart getty@tty1

xml: https://pastebin.com/0hM2eXEf

apprehensions commented 3 years ago

checking the logs it seems that my gpu is being used? qemu-system-x86_64: vfio_region_write(0000:01:00.0:region1+0x1fd308, 0xff0c0c0cff0e0e0e,8) failed: Device or resource busy

joeknock90 commented 3 years ago

I Updated the start and stop scripts recently. Try working with those.

apprehensions commented 3 years ago

nothing has changed, something else i realized is that qemu doesn't even execute any script at all in the directory

apprehensions commented 3 years ago

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 101, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 57, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/domain.py", line 1329, in startup
    self._backend.create()
  File "/usr/lib/python3.9/site-packages/libvirt.py", line 1353, in create
    raise libvirtError('virDomainCreate() failed')
libvirt.libvirtError: Cannot recv data: Connection reset by peer
joeknock90 commented 3 years ago

I've removed the unloading of the nvidia modules manually in the start script, which it looks like you've still got there in yours. Try taking out the modprobe -r statements in the start script.

Are both scripts set to executable?

apprehensions commented 3 years ago

i have no idea what i did, but it works now, i might diagnose this later when i reinstall my linux distrobution, also i dont get why you removed the unloading of the nvidia modules, that will just cause errors iirc.

joeknock90 commented 3 years ago

It used to cause errors, however, virsh-nodedev seems to properly do what it was originally intended to do, which is to properly bind the pci device to vfio.

Through some testing, I and quite a few other people have found that you will run into fewer issues by letting virsh-nodedev handle binding and unbinding.

qt1 commented 2 years ago

Hi, joeknock90

virsh nodedev-dettach pci_0000_05_00_0

Just hangs. I have no clue to what is wrong. No diagnostics / debug mode.

How would you recommend solving?

Thanks

slimcdk commented 2 years ago

Also stuck here with virsh nodedev-dettach hanging forever.

UmutAlihan commented 1 year ago

I am having the same forever hanging issue ://

archenroot commented 10 months ago

Same here