openSUSE / open-build-service

Build and distribute Linux packages from sources in an automatic, consistent and reproducible way #obs
https://openbuildservice.org
GNU General Public License v2.0
935 stars 438 forks source link

obsstoragesetup scriptreads VG free space instead of VG total size causing erroneous calculation and failure to create workers #14579

Open pallaswept opened 1 year ago

pallaswept commented 1 year ago

Issue Description

obsstoragesetup script reads VG size incorrectly and fails

Expected Result

Script runs a command to get the size of the VGm but the command is malformed and instead of reading the VG total size, it reads the free space

This means that if you have space used by the server LV as per the instructions, then the server LV space is subtracted from the total size, and then it is subtracted again a few times later, leading the script to believe there is insufficient space

How to Reproduce

1) Install appliance with disk with existing LVM data including server LV as per documentation at here and here 2) Start appliance 3) Be sad at obsstoragesetup service failure when the math goes wrong

https://github.com/openSUSE/open-build-service/blob/master/dist/obsstoragesetup#L227 VG_SIZE=vgdisplay -c OBS | cut -d: -f16 returns free space on the VG it should be VG_SIZE=vgdisplay -c OBS | cut -d: -f14

Further Information

Here is the server LV of size 40GB out of 100GB total VG:

obs:~ # vgs
  VG  #PV #LV #SN Attr   VSize   VFree 
  OBS   1   1   0 wz--n- 100.00g 60.00g
obs:~ # vgdisplay 
  --- Volume group ---
  VG Name               OBS
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  229
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               100.00 GiB
  PE Size               4.00 MiB
  Total PE              25599
  Alloc PE / Size       10240 / 40.00 GiB
  Free  PE / Size       15359 / 60.00 GiB
  VG UUID               2FXv3k-LP5f-XdC0-mQnX-3XVk-AWdg-QIoSpk

obs:~ # lvdisplay 
  --- Logical volume ---
  LV Path                /dev/OBS/server
  LV Name                server
  VG Name                OBS
  LV UUID                RTolbp-sB3m-wdZa-XwxA-kY9j-nATr-sVcwky
  LV Write Access        read/write
  LV Creation host, time obs, 2023-06-29 20:17:11 +0000
  LV Status              available
  # open                 1
  LV Size                40.00 GiB
  Current LE             10240
  Segments               3
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           254:0

Script getting the math wrong and accounting for the server volume taking up free space, twice over:

obs:~ # /usr/sbin/obsstoragesetup start
redirecting to systemctl start .service
Remove all LVM cache and worker partitions in VG OBS.
  1 logical volume(s) in volume group "OBS" now active
Looking for existing OBS Server LVM Volume
mount: /srv/obs: /dev/mapper/OBS-server already mounted on /srv/obs.
Found BSConfig.pm
Directory '/srv/obs/run' exists
owner of '/srv/obs/run' is obsrun
Owner of '/srv/obs/run' is 'obsrun'. Nothing to fix!
ERROR: Not enough space for worker root LVs, just -3074 MB, but at least 4 GB needed.
NUM=2
VG_SIZE=61436 <<<------- Note this is wrong. It's 100GB. 
PE_SIZE=4096
PE_SIZE_IN_MB=4
OBS_SERVER_SIZE=40960
TOTAL_SWAP_SIZE=1024
FINAL_VG_SIZE = 61436 - 40960 - 25600 - 1024 <<<------ Free space after the server LV, minus the server LV again, minus caches
FINAL_VG_SIZE=-6148 <<<------ negative free space 
OBS_WORKER_ROOT_SIZE=-3074 <<<------ negative free space per worker because it counted the 40G server LV twice.
MIN_WORKER_ROOT_SIZE=4096

Script using the 16th entry is getting free space not total:

obs:~ # vgdisplay -c OBS | cut -d: -f16
15359
obs:~ # vgdisplay -c OBS
  OBS:r/w:772:-1:0:1:1:-1:0:1:1:104853504:4096:25599:10240:15359:2FXv3k-LP5f-XdC0-mQnX-3XVk-AWdg-QIoSpk
obs:~ # vgdisplay 
  --- Volume group ---
  VG Name               OBS
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  229
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               100.00 GiB
  PE Size               4.00 MiB
  Total PE              25599
  Alloc PE / Size       10240 / 40.00 GiB
  Free  PE / Size       15359 / 60.00 GiB
  VG UUID               2FXv3k-LP5f-XdC0-mQnX-3XVk-AWdg-QIoSpk

Happy service after changing f16 to f14:

obs:~ # lvs
  LV            VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cache         OBS -wi-ao----  25.00g                                                    
  server        OBS -wi-ao----  40.00g                                                    
  worker_root_1 OBS -wi-a-----  17.00g                                                    
  worker_root_2 OBS -wi-a-----  17.00g                                                    
  worker_swap_1 OBS -wi-a----- 512.00m                                                    
  worker_swap_2 OBS -wi-a----- 512.00m                                                    
obs:~ # vgdisplay 
  --- Volume group ---
  VG Name               OBS
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  264
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                6
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               100.00 GiB
  PE Size               4.00 MiB
  Total PE              25599
  Alloc PE / Size       25598 / 99.99 GiB
  Free  PE / Size       1 / 4.00 MiB
  VG UUID               2FXv3k-LP5f-XdC0-mQnX-3XVk-AWdg-QIoSpk
pallaswept commented 1 year ago

https://github.com/openSUSE/open-build-service/blob/master/dist/obsstoragesetup#L227 VG_SIZE=vgdisplay -c OBS | cut -d: -f16 returns free space on the VG it should be VG_SIZE=vgdisplay -c OBS | cut -d: -f14

bump :) Just change that 6 to a 4 and it's a working script.

Not a barbed question but an honest curiosity: Why has such a quick and simple fix that completely breaks a critical feature that official documentation instructs us to use, taken 2 months for nothing to happen? Would it help if I made a PR for you? Is there some kind of process holding it up? Can I help somehow?

Feel free to not answer that and just apply the fix, since answering will take longer than fixing it :) But I am genuinely puzzled.

krauselukas commented 1 year ago

@pallaswept we hear you :) @M0ses ping :) what do you think?