topolvm / topolvm

Capacity-aware CSI plugin for Kubernetes
Apache License 2.0
780 stars 154 forks source link

lvmd creates thin-volume snapshots with activationskip enabled #566

Closed predakanga closed 1 year ago

predakanga commented 2 years ago

Describe the bug When creating a snapshot of a thin volume, the resulting LV is inactive causing the snapshot process to fail due to an invalid kernel major/minor version (see bug #559). At this point, if topolvm-node restarts, it will go into a crash loop until the sysadmin manually removes the snapshot LV.

The reason this occurs is that the LV is created with the skipactivation flag as shown below - note the 'k' flag which means skipactivation:

root@worker4:~# lvs
  LV                                   VG     Attr       LSize   Pool  Origin                               Data%  Meta%  Move Log Cpy%Sync Convert
  33e540de-6342-4073-a796-82566c9622c4 ssd-vg Vwi---tz-k 150.00g pool0 bfac4d38-9337-4cd6-bf38-047232d1d6b3
  bfac4d38-9337-4cd6-bf38-047232d1d6b3 ssd-vg Vwi-aotz-- 150.00g pool0                                      37.07
  pool0                                ssd-vg twi-aotz-- 363.61g                                            15.29  10.65
  system                               ssd-vg -wi-ao---- 100.00g

As far as I can tell, this has been the default behaviour since skipactivation was introduced in LVM v2.02.99, and is mentioned in the lvcreate manpage.

I don't know whether the snapshot volume actually needs to be activated, but if it does then lvcreate should be called with -k n to explicitly disable the skipactivation flag.

Environments

Yuggupta27 commented 1 year ago

Hi @predakanga , the thin-volume snapshots are not intended to be activated as by nature they represent Kubernetes VolumeSnaphots; which is a Read only resource. Thus, to avoid accidently writing to it, we have kept the skip activation flag as it is inplace of disabling it. On the other hand, in case of PVC-Clones and PVC-Restores, we disable the skip-activation flag using the -k n as you mentioned.

predakanga commented 1 year ago

Hi @Yuggupta27

Thanks for the clarification - the reason that I brought this up is that as per @daichimukai's last comment on #559, it seemed like the inactive LVs were seen as a problem.

If that's not the case and sysadmins shouldn't need to take any action, then no problem 👍

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any activity for 30 days. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been automatically closed due to inactivity. Please feel free to reopen this issue (or open a new one) if this still requires investigation. Thank you for your contribution.