zipurman / oVIRT_Simple_Backup

A REST API backup using PHP for oVirt 4.2.x
GNU General Public License v3.0
56 stars 32 forks source link

scsi remove when detach #49

Open nevesigor opened 5 years ago

nevesigor commented 5 years ago

Expected behaviour

It should scsi remove in Kernel the attached disks before asking engine to remove them.

I think we need to do some exec(echo 1 > /sys/block/sdb/device/delete), I checked comm/disk_detatch.php code, should be there but I did not find the driver letter variable (check NOTES section).

Actual behaviour

They are removed directly in engine without behing removed on Linux Kernel, this results in stuff like this: [ 3819.288384] Buffer I/O error on dev dm-1, logical block 0, async page read [ 3819.290094] Buffer I/O error on dev dm-2, logical block 0, async page read [ 3819.291579] Buffer I/O error on dev dm-3, logical block 0, async page read [ 3819.293523] Buffer I/O error on dev dm-4, logical block 0, async page read [ 3819.293937] Buffer I/O error on dev dm-5, logical block 0, async page read [ 3819.294655] blk_update_request: I/O error, dev sdf, sector 0 [ 3819.294896] Buffer I/O error on dev sdf, logical block 0, async page read

Notes

I think all the logic on the backup code settles over that the drives are "always" sequential, sdc, sdd, sde, ... if for some reason and somehow this does not happen (for numerous reasons), the backup logic all breaks, I got so many errors that even got Kernel Panic in SimpleBackup VM sometimes...

We need some way to identify directly which disk it's which outside and inside the SimpleBackup VM. This would help on future also:

zipurman commented 5 years ago

The "echo 1 > /sys/block/sdb/device/delete" doesnt work, tried that way back when I wrote this.

If you are using virtio_scsi on the SimpleBackup VM, try changing it to virtio as this works much better IMO as just virtio.

Some of these are great ideas but there are too far out of scope from my needs. I wrote this project to solve my backup issues, which it has. I shared it to help anyone wanting to use the code. I am happy to fix simple bugs, but I dont have the time to code changes beyond that.

nevesigor commented 5 years ago

virtio does not support discard for using with SSD storage. virtio_scsi does.

nevesigor commented 5 years ago

All my machines are virtio-scsi, I have no way to use virtio, the disks are attached to simplebackup machine on his nature backend, if they are virtio-scsi on the machine I'm doing the backup they will be virtio-scsi on simplebackup machine.

I have changed the simplebackup machine os disk to virtio (vda) and look what happen when I started one backup:

[2019-03-13:12:24:30] Disk Dat Write dev-lnx-01_Disk1 - 8e5707f3-8964-4530-8899-6859e0adb6e7 - true - virtio_scsi - 17179869184 - vdb [2019-03-13:12:24:30] Disk Dat Write dev-lnx-01_Disk2 - 62bc835a-2e45-45c9-9433-aab30e8db2f2 - false - virtio_scsi - 16106127360 - vdz

They were identified as vdb and vdz, but "fdisk -l" says: Disk /dev/sda: 16 GiB, 17179869184 bytes, 33554432 sectors .... Disk /dev/sdb: 15 GiB, 16106127360 bytes, 31457280 sectors ....

I just get dots "...." on simplebackup.log, I'm imagine it's trying to do "dd" from something that does not even exists and it never ends!

zipurman commented 5 years ago

Are you running the simpleBackup on Debian? Are you running the latest code from my repo and is it unaltered? What type of compression are you using if any?

nevesigor commented 5 years ago

Yes, it's the latest code and it's not changed at all. I'm using NFS as you do and LZO as compression.

On Wed, Mar 13, 2019 at 2:13 PM Zip notifications@github.com wrote:

Are you running the simpleBackup on Debian? Are you running the latest code from my repo and is it unaltered? What type of compression are you using if any?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zipurman/oVIRT_Simple_Backup/issues/49#issuecomment-472436952, or mute the thread https://github.com/notifications/unsubscribe-auth/ArjFO-Ca_DNDvyA400fWCidyTs-lgmP7ks5vWQdzgaJpZM4brEXx .

-- "Horsepower sells cars, torque wins races" - Enzo Ferrari "Choose a job you love, and you will never have to work a day in your life." - Confucius

zipurman commented 5 years ago

Can you send me the output of the following commands (unaltered other than password data):

cat /var/www/html/config.php

fdisk -l

df -h

uname -a

cat /etc/debian_version

I dont use virtio-scsi but I do have a testing system where I can spin up the same setup and see what happens.

nevesigor commented 5 years ago

Before backup, right after boot:

<?php                                                                                                                                                                      
$settings = array(                                                                                                                                                         
"vms_to_backup" => array("", ),                                                                                                                                            
"label" => "BU_",                                                                                                                                                          
"uuid_backup_engine" => "a63cb30c-1bd5-45f8-9c23-0673893a708c",                                                                                                            
"ovirt_url" => "ovirt-engine.domain.tld",                                                                                                                                
"ovirt_user" => "admin@internal",                                                                                                                                          
"ovirt_pass" => "changed",                                                                                                            
"mount_backups" => "/mnt/backups",                                                                                                                                         
"drive_type" => "sd",                                                                                                                                                      
"drive_interface" => "virtio_scsi",
"backup_log" => "/var/log/simplebackup.log",
"email" => "it@domain.tld",
"emailfrom" => "it@domain.tld",
"retention" => 7,
"firstbackupdisk" => "b",
"storage_domain" => "md10_NVMe",
"cluster" => "Production",
"mount_migrate" => "",
"xen_ip" => "",
"xen_migrate_uuid" => "",
"xen_migrate_ip" => "",
"restore_console" => "vnc",
"restore_os" => "rhel_7x64",
"restore_vm_type" => "server",
"restore_cpu_sockets" => "2",
"restore_cpu_cores" => "2",
"restore_cpu_threads" => "1",
"tz" => "Europe/Lisbon",
"compress" => "2",
"withoutmemory" => "0",
);
============================================================
Disk /dev/vda: 15 GiB, 16106127360 bytes, 31457280 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xeb48f487

Device     Boot  Start      End  Sectors  Size Id Type
/dev/vda1  *      2048   976895   974848  476M 83 Linux
/dev/vda2       976896 31455231 30478336 14.5G 8e Linux LVM

Disk /dev/mapper/debian-root: 14.5 GiB, 15602810880 bytes, 30474240 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
============================================================
Filesystem                              Size  Used Avail Use% Mounted on
udev                                    3.9G     0  3.9G   0% /dev
tmpfs                                   799M  8.5M  790M   2% /run
/dev/mapper/debian-root                  15G  1.2G   13G   9% /
tmpfs                                   3.9G     0  3.9G   0% /dev/shm
tmpfs                                   5.0M     0  5.0M   0% /run/lock
tmpfs                                   3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda1                               453M   37M  404M   9% /boot
172.24.193.235:/backups/datavmbackups1  2.0T   34G  1.9T   2% /mnt/backups
tmpfs                                   799M     0  799M   0% /run/user/0
============================================================
Linux ovirt-backup.domain.tld 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux
============================================================
9.8
============================================================
nevesigor commented 5 years ago

[2019-03-15:14:26:34] Disk Dat Write dev-lnx-01_Disk1 - 8e5707f3-8964-4530-8899-6859e0adb6e7 - true - virtio_scsi - 17179869184 - vdb [2019-03-15:14:26:34] Disk Dat Write dev-lnx-01_Disk2 - 62bc835a-2e45-45c9-9433-aab30e8db2f2 - false - virtio_scsi - 16106127360 - vdz

After starting the backup process, this is it and I only get '....' on the simplebackup.log, I guess it's trying to backup the wrong disks (vdb and vdz):

=======================================================================
Disk /dev/vda: 15 GiB, 16106127360 bytes, 31457280 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xeb48f487

Device     Boot  Start      End  Sectors  Size Id Type
/dev/vda1  *      2048   976895   974848  476M 83 Linux
/dev/vda2       976896 31455231 30478336 14.5G 8e Linux LVM

Disk /dev/mapper/debian-root: 14.5 GiB, 15602810880 bytes, 30474240 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/sda: 16 GiB, 17179869184 bytes, 33554432 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00005c88

Device     Boot   Start      End  Sectors Size Id Type
/dev/sda1  *       2048  2099199  2097152   1G 83 Linux
/dev/sda2       2099200 33554431 31455232  15G 8e Linux LVM

Disk /dev/mapper/cl-swap: 1.6 GiB, 1719664640 bytes, 3358720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mapper/cl-root: 13.4 GiB, 14382268416 bytes, 28090368 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/sdb: 15 GiB, 16106127360 bytes, 31457280 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mapper/db-postgresql: 15 GiB, 16101933056 bytes, 31449088 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
=======================================================================
Filesystem                              Size  Used Avail Use% Mounted on
udev                                    3.9G     0  3.9G   0% /dev
tmpfs                                   799M  8.5M  790M   2% /run
/dev/mapper/debian-root                  15G  1.2G   13G   9% /
tmpfs                                   3.9G     0  3.9G   0% /dev/shm
tmpfs                                   5.0M     0  5.0M   0% /run/lock
tmpfs                                   3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda1                               453M   37M  404M   9% /boot
172.24.193.235:/backups/datavmbackups1  2.0T   30G  1.9T   2% /mnt/backups
tmpfs                                   799M     0  799M   0% /run/user/0
=======================================================================
nevesigor commented 5 years ago

We should use something like "lsblk -nS" to read the scsi disks... Then get the serial (that match the ID in ovirt) with udev: udevadm info --query=all --name=/dev/sda | grep ID_SCSI_SERIAL

This way, no mather what drive letter and how many drives we have, it's ALWAYS right the device to make the backup from.

zipurman commented 5 years ago

This isnt an issue for anyone else that I am aware of. Something in your setup is causing this issue. I will test when I get a chance to see if this is a new bug, but if so, nobody else has reported it.

Debian, if configured as required, should mount the first disk as /dev/sda. Subsequent disks will mount at /dev/sdb, /dev/sdc, etc. Then when unmounted after backups, those devs should be released and should be reused automatically on next mount. If something is delaying their unmounting or is causing them to not-release the devs, then issues will occur. A reboot of the SimpleBackupVM should fix the issue but if the cause persists, then the issue will reoccur.

I agree that using UUIDs would be better, but if the OS is locking devs without releasing them, you'll also be running into resource problems over the course of time. Something is broken in the VM or in oVirt that is causing the disks to be locked in some way IMO.

nevesigor commented 5 years ago

Debian have the right devices (sda+sdb), the code it's thinking in (vdb+vdz), that's why the backup it's not working.

The subject about the scsi io errors, it's different question. The device it's detached in ovirt without being removed in Linux Kernel, this its not a problem with virtio and does not happen if you use virtio disks, but all my machines are virtio-scsi because they are SSD and I need "discard". If you could try to have some machine with virtio-scsi and do the backup, I think you will notice the problem.

If you do this same process to Linux Kernel on physical disk attached to mdadm RAID, it also happens as the mdadm locks the devs and when you replace a failed drive it's typically they will appear on different drive letter, it's the same problem and as nothing to do with my setup, it's just a matter of Kernel + SCSI disks.

On Fri, Mar 15, 2019 at 5:08 PM Zip notifications@github.com wrote:

This isnt an issue for anyone else that I am aware of. Something in your setup is causing this issue. I will test when I get a chance to see if this is a new bug, but if so, nobody else has reported it.

Debian, if configured as required, should mount the first disk as /dev/sda. Subsequent disks will mount at /dev/sdb, /dev/sdc, etc. Then when unmounted after backups, those devs should be released and should be reused automatically on next mount. If something is delaying their unmounting or is causing them to not-release the devs, then issues will occur. A reboot of the SimpleBackupVM should fix the issue but if the cause persists, then the issue will reoccur.

I agree that using UUIDs would be better, but if the OS is locking devs without releasing them, you'll also be running into resource problems over the course of time. Something is broken in the VM or in oVirt that is causing the disks to be locked in some way IMO.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zipurman/oVIRT_Simple_Backup/issues/49#issuecomment-473368333, or mute the thread https://github.com/notifications/unsubscribe-auth/ArjFO7nrOFmtYgyd7BdnjUaYHCEDTRZsks5vW9OSgaJpZM4brEXx .

-- "Horsepower sells cars, torque wins races" - Enzo Ferrari "Choose a job you love, and you will never have to work a day in your life." - Confucius

zipurman commented 5 years ago

It appears that if the simpleBackup VM does not have a disk mounted in the range it is looking for, IE, sdb,sdc,sdd,sde....sdz it will return the last disk it tries which is sdz. If it finds a match before that, it will return the correct disk.

Using simpleBackup it either maps using virtio or virtio-scsi. I isnt coded to allow for both at the same time. I see in your fdisk above that you have /dev/sda and /dev/vda ... that is the issue.

This is what I would suggest. If you want to use virtio-scsi do this.

  1. Reboot the oVirtSimpleBackupVM
  2. Confirm only the one OS disk is attached
  3. Do a "fdisk -l" and as long as you see /dev/sda as the disk, proceed. If not, stop and let me know.
  4. Go to the settings tab of the simpleBackup UI and click save
  5. make sure the disk type shows as "sd". If not, stop and let me know. If it is "sd" then proceed and try a backup.
  6. Watch when the backup starts ... it should mount the disk to be backed up as /dev/sdb. If it mounts as /dev/vda then something weird is going on that we will have to look into.

Let me know.

nevesigor commented 5 years ago

Yes that way works buy I only manage to do one backup because it starts to give io errors in the end, because the disk is not scsi removed from the kernel.

This is the first scenario I reported.

On Sun, Mar 17, 2019, 4:26 AM Zip notifications@github.com wrote:

It appears that if the simpleBackup VM does not have a disk mounted in the range it is looking for, IE, sdb,sdc,sdd,sde....sdz it will return the last disk it tries which is sdz. If it finds a match before that, it will return the correct disk.

Using simpleBackup it either maps using virtio or virtio-scsi. I isnt coded to allow for both at the same time. I see in your fdisk above that you have /dev/sda and /dev/vda ... that is the issue.

This is what I would suggest. If you want to use virtio-scsi do this.

  1. Reboot the oVirtSimpleBackupVM
  2. Confirm only the one OS disk is attached
  3. Do a "fdisk -l" and as long as you see /dev/sda as the disk, proceed. If not, stop and let me know.
  4. Go to the settings tab of the simpleBackup UI and click save
  5. make sure the disk type shows as "sd". If not, stop and let me know. If it is "sd" then proceed and try a backup.
  6. Watch when the backup starts ... it should mount the disk to be backed up as /dev/sdb. If it mounts as /dev/vda then something weird is going on that we will have to look into.

Let me know.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zipurman/oVIRT_Simple_Backup/issues/49#issuecomment-473615324, or mute the thread https://github.com/notifications/unsubscribe-auth/ArjFO_Oe1LUNUzuYD8-Nq28QIxhN4FOcks5vXcQHgaJpZM4brEXx .

zipurman commented 5 years ago

I know I had this issue in the beginning, not sure what I did to fix it. I will setup an environment to see if I can fix it easily. The OS, if unable to release the disks from the scsi resources, will blow up over time as it cannot just keep connecting unlimited disks.

So after the first backup, it mounts the next disk at /dev/sdc skipping sdb?

zipurman commented 5 years ago

Just curious, which version of oVirt you are using? In my notes, this stopped happening in 4.2. I am not saying this is the issue ... just want to make sure it's not ;)

nevesigor commented 5 years ago

I'm with 4.3.

Yes that's what happen, the kernel keeps the disks attached sdb, sdc... Even after detached on ovirt side, obviously the next backup will not work correctly.

I think this is a mix of assuming disks ordering will be right and not matching the scsi serial, also not removing the scsi disk prior detach them from simple backup machine.

On Sun, Mar 17, 2019, 10:32 PM Zip notifications@github.com wrote:

Just curious, which version of oVirt you are using? In my notes, this stopped happening in 4.2. I am not saying this is the issue ... just want to make sure it's not ;)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zipurman/oVIRT_Simple_Backup/issues/49#issuecomment-473721810, or mute the thread https://github.com/notifications/unsubscribe-auth/ArjFO-887ZR7XAF94YtGM4CPOfiECvlEks5vXsJngaJpZM4brEXx .

zipurman commented 5 years ago

Okay, so here is what I have tested:

  1. Created a new simpleBackupVM in my setup using a virtio-scsi disk. (All my disk images reside on an iSCSI SAN)
  2. Using the new VM in step 1, I backed up another VM and it used /dev/sdb
  3. After the backup in step 2 was completed, I ran another backup of another VM which also mounted on /dev/sdb without issue and the backup succeeded.

Just becasue your VMs use virtio-scsi, it doesnt mean that the simpleBackup needs to use it in order to image their disks. Something else for you to try would be to install simpleBackup from scratch on a virtio disk VM. Then using that, see if you can backup your other VMs?

I am not sure what is causing the scsi lock in your setup, but something is. I didnt do anything special to get it to work, I just installed a new Debian 9, and then the installer script.

To be fair, I am using all virtio VM's ATM, so I am in the process of installing a couple virtio-scsi VMs to see if that replicates the issue ... but I still think having the simpleBackup VM as virtio may solve your issue?

zipurman commented 5 years ago

Okay, so if I run the simpleBackupVM on virtio-scsi and then I have 2 VMs that also use virtio-scsi, I am able to back them both up without issue.

I did try backing up the disks using virtio on the simpleBackupVM when the target VMs are using virtio-scsi and it !! DOES FAIL !! with the issue you are seeing. I will get that corrected ASAP!

zipurman commented 5 years ago

Scratch the above where I said it "DOES FAIL".

When I setup the simpleBackup to use virtio and I set the disktype to be "vd" and I set the first disk to be "b", then it backs-up the VMS (virtio or virtio-scsi) by mounting the disks as virtio and runs through the backups fine.

My issue with the above reported error is I had the first disk set to "c" from a previous session which was causing other issues.

So again, try a fresh install of the script on a virtio disk. Then try backing up your VMs and see what happens.

nevesigor commented 5 years ago

I can't attach disks without being virtio-scsi, they are native used as virtio-scsi, they will automatically be added as the same form to the SimpleBackup machine.

If SimpleBackup machine it's virtio-scsi and the disks from the VM's are (already) virtio-scsi, the disk identification happens normally, the major concern it's the I/O errors I'm getting because of no detach, tomorrow I will just do some funky exec() php functions on the code to do the "scsi delete" nasty, just to check if it really solves the issue, maybe this can happen for some another reason I'm not getting there.

On Mon, Mar 18, 2019 at 10:56 PM Zip notifications@github.com wrote:

Scratch the above where I said it "DOES FAIL".

When I setup the simpleBackup to use virtio and I set the disktype to be "vd" and I set the first disk to be "b", then it backs-up the VMS (virtio or virtio-scsi) by mounting the disks as virtio and runs through the backups fine.

My issue with the above reported error is I had the first disk set to "c" from a previous session which was causing other issues.

So again, try a fresh install of the script on a virtio disk. Then try backing up your VMs and see what happens.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zipurman/oVIRT_Simple_Backup/issues/49#issuecomment-474132864, or mute the thread https://github.com/notifications/unsubscribe-auth/ArjFOwT_hdmuC-yJBqmBmjXY6sMdvLfxks5vYBmygaJpZM4brEXx .

-- "Horsepower sells cars, torque wins races" - Enzo Ferrari "Choose a job you love, and you will never have to work a day in your life." - Confucius

zipurman commented 5 years ago

k, let me know. I remember trying scsi deletes and device deletes back when I started the script and had no luck with that.

In my test environment, I now have multiple virtio-scsi VMs. My virtio simpleBackupVM is able to back all of them up without issue as well as my virtio VMs. I realize our setups are different, and I am not familiar with all the pros/cons of virtio-scsi. I had the option with my setup as I am just running images on a common iSCSI MPIO disk.