kubernetes-retired / external-storage

[EOL] External storage plugins, provisioners, and helper libraries
Apache License 2.0
2.7k stars 1.6k forks source link

Local volume not properly cleaned up #1307

Closed sebgl closed 4 years ago

sebgl commented 4 years ago

I got into a situation where a data volume managed by the local volume provisioner got mounted with existing data, that should normally have been cleaned up beforehand.

Looking at the sequence of events in the logs, I'm wondering if there could be a stale cache/state machine bug somewhere.

06:21:26.493524       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
06:21:26.466362       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
06:18:32.631464       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
06:09:07.631263       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:59:42.631081       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:50:17.630681       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:40:52.630552       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:37:43.678299       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:37:43.672479       1 discovery.go:326] Created PV "local-pv-72d14209" for volume at "/mnt/disks/pvs/pv-7"
05:37:43.671771       1 cache.go:55] Added pv "local-pv-72d14209" to cache
05:37:43.665501       1 discovery.go:298] Found new volume of volumeMode "Filesystem" at host path "/mnt/disks/pvs/pv-7" with capacity 395260067840, creating Local PV "local-pv-72d14209"
05:37:33.876690       1 cache.go:73] Deleted pv "local-pv-72d14209" from cache
05:37:33.662058       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:37:33.655461       1 deleter.go:145] Deleting pv local-pv-72d14209 after successful cleanup
05:37:30.434369       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:35:21.099168       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:35:21.042377       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:31:27.635442       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:25:50.906455       1 cache.go:73] Deleted pv "local-pv-72d14209" from cache
05:25:33.583650       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:25:23.110378       1 deleter.go:145] Deleting pv local-pv-72d14209 after successful cleanup
05:25:13.110149       1 deleter.go:246] Deleting PV file volume "local-pv-72d14209" contents at hostpath "/mnt/disks/pvs/pv-7", mountpath "/mnt/disks/pvs/pv-7"
05:25:13.109651       1 deleter.go:175] Start cleanup for pv local-pv-72d14209
05:25:10.488516       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:23:11.988944       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:23:11.980920       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:22:02.630271       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:12:37.629861       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:03:12.866839       1 cache.go:64] Updated pv "local-pv-72d14209" to cache
05:03:12.863451       1 discovery.go:326] Created PV "local-pv-72d14209" for volume at "/mnt/disks/pvs/pv-7"
05:03:12.862208       1 cache.go:55] Added pv "local-pv-72d14209" to cache
05:03:12.854132       1 discovery.go:298] Found new volume of volumeMode "Filesystem" at host path "/mnt/disks/pvs/pv-7" with capacity 395260067840, creating Local PV "local-pv-72d14209"

There's a first PV deletion at 05:25:23.110378:

05:25:23.110378       1 deleter.go:145] Deleting pv local-pv-72d14209 after successful cleanup
05:25:13.110149       1 deleter.go:246] Deleting PV file volume "local-pv-72d14209" contents at hostpath "/mnt/disks/pvs/pv-7", mountpath "/mnt/disks/pvs/pv-7"

After which I would expect some logs indicating a new volume has been found, similar to what happens at 05:37:43.665501, but there's none.

Then, we get a second PV deletion at 05:37:33.655461:

05:37:33.655461       1 deleter.go:145] Deleting pv local-pv-72d14209 after successful cleanup

Note here that there was no Deleting PV file volume log message before that deletion.

The volume is then recreated at 05:37:43.672479:

05:37:43.672479       1 discovery.go:326] Created PV "local-pv-72d14209" for volume at "/mnt/disks/pvs/pv-7"
05:37:43.671771       1 cache.go:55] Added pv "local-pv-72d14209" to cache
05:37:43.665501       1 discovery.go:298] Found new volume of volumeMode "Filesystem" at host path "/mnt/disks/pvs/pv-7" with capacity 395260067840, creating Local PV "local-pv-72d14209"

What I observed on my side is an application mounting that volume 1 hour later. The volume contains the data that existed there at 05:36, before the "bad" cleanup that happened at 05:37:33.655461.

It looks to me, but I may be wrong, that something went wrong in the cleanup state. I don't see any error in the logs.

It happened only once out of many runs, I'm not able to easily reproduce.

msau42 commented 4 years ago

What version are you using? This repo is deprecated, and the local provisioner has moved to https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner

I believe some issues in this area have been fixed in later versions, such as https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/pull/174

sebgl commented 4 years ago

Oh, sorry about that I missed the deprecated repo! I'm using quay.io/external_storage/local-volume-provisioner:v2.2.0, will update to the latest version. It indeed looks a lot like the issue you pointed out.

Thanks @msau42!