longhorn / longhorn

Cloud-Native distributed storage built on and for Kubernetes
https://longhorn.io
Apache License 2.0
5.83k stars 578 forks source link

[BUG] actual size too big, only Volume Head (not a snaphsot) #3423

Open StefanSa opened 2 years ago

StefanSa commented 2 years ago

Describe the bug

I don't understand the current size of 159Gi displayed in the WebUI, in fact only 9.9Gi is used in the pod. In the mounted pvc you can only see the head image, but no snapshot. For info, there is only one replica. Pod and mounted pvc are on different nodes

Volume Details

 Attached Node & Endpoint:
k-node02
/dev/longhorn/pvc-ca21139c-3d1f-4b95-a287-b5e7462b3770
Size:
400 Gi
Actual Size:160 Gi
Data Locality:disabled
Access Mode:ReadWriteOnce
Engine Image:rancher/mirrored-longhornio-longhorn-engine:v1.2.2
Created:Invalid date
Encrypted:False
Node Tags:
Disk Tags:
Last Backup:
Last Backup At:
Replicas Auto Balance:ignored
Instance Manager:
instance-manager-e-28ae6668
Namespace:opensearch
PVC Name:opensearch-cluster-data-opensearch-cluster-data-2
PV Name:pvc-ca21139c-3d1f-4b95-a287-b5e7462b3770
PV Status:Bound
Revision Counter:False
Pod Name:opensearch-cluster-data-2
Pod Status:Running
Workload Name:opensearch-cluster-data
Workload Type:StatefulSet

mounted pvc folder:

ls -lh
total 161G
-rw------- 1 root root 4,0K 16. Dez 12:20 revision.counter
-rw-r--r-- 1 root root 400G 16. Dez 12:20 volume-head-000.img
-rw-r--r-- 1 root root  126 14. Dez 16:03 volume-head-000.img.meta
-rw-r--r-- 1 root root  144 14. Dez 16:03 volume.meta

df -h in the one affected pod

[opensearch@opensearch-cluster-data-2 ~]$ df -h
Filesystem                                              Size  Used Avail Use% Mounted on
overlay                                                 2.2T  358G  1.9T  17% /
tmpfs                                                    64M     0   64M   0% /dev
tmpfs                                                    71G     0   71G   0% /sys/fs/cgroup
/dev/mapper/36782bcb074cca00021eb8d5a0fa91b96-part4     2.2T  358G  1.9T  17% /etc/hosts
shm                                                      64M     0   64M   0% /dev/shm
tmpfs                                                    71G     0   71G   0% /run/secrets/credentials.d
/dev/longhorn/pvc-ca21139c-3d1f-4b95-a287-b5e7462b3770  393G  9.9G  383G   3% /usr/share/opensearch/data

WebUI, no snaphot here also not with "Show System Hidden"

no_snapshot01

Expected behavior

Not so big difference between current size and actual size without snapshot.

Log or Support bundle

If applicable, add the Longhorn managers' log or support bundle when the issue happens. You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment

Additional context

Add any other context about the problem here.

PhanLe1010 commented 2 years ago

The size as viewed inside the filesystem (inside the workload pod) may be very different from the size in the block level (the actual size of Longhorn volume, which is also the size of the replica folder on the host)

We have a document explaining this behavior https://longhorn.io/docs/1.2.2/volumes-and-nodes/volume-size/. Please see point # 3 (Delete data#1 from the mount point) in the document.

PhanLe1010 commented 2 years ago

Some side quesions, Why there are so many control plane nodes (9) vs the small number of worker nodes (3)? The network resource is not enough, recommending 10Gbit

StefanSa commented 2 years ago

The size as viewed inside the filesystem (inside the workload pod) may be very different from the size in the block level (the actual size of Longhorn volume, which is also the size of the replica folder on the host)

We have a document explaining this behavior https://longhorn.io/docs/1.2.2/volumes-and-nodes/volume-size/. Please see point # 3 (Delete data#1 from the mount point) in the document.

Thank you for the explanation. Is this behaviour correct? No snapshot was ever generated or deleted. @PhanLe1010 any help here ?

StefanSa commented 2 years ago

Some side quesions,

Why there are so many control plane nodes (9) vs the small number of worker nodes (3)?

The network resource is not enough, recommending 10Gbit

Sorry, my mistake. 3 control and 3 worker node, also 10Gbit Nic

StefanSa commented 2 years ago

may the same Probleme here #1555

jenting commented 2 years ago

We calculate the size by the below command.

stat /var/lib/longhorn/ -fc '{"path":"%n","fsid":"%i","type":"%T","freeBlock":%f,"totalBlock":%b,"blockSize":%s}'

Would you mind executing the command on the host? Besides that, do you know what's the file system type(ext4/xfs/btrfs...) on the host?

StefanSa commented 2 years ago

Hi @jenting stat /mnt/san/:

stat /mnt/san/ -fc '{"path":"%n","fsid":"%i","type":"%T","freeBlock":%f,"totalBlock":%b,"blockSize":%s}'
{"path":"/mnt/san/","fsid":"52ebbce0b07f8675","type":"ext2/ext3","freeBlock":355118669,"totalBlock":624392893,"blockSize":4096}

the filesystem on /mnt/san is ext4

joshimoo commented 2 years ago

Your volume has a nominal size of 400gb, over time the FS will write to every block. Since there is no trim implementation, the old blocks do not get released, i.e. the data remains in them. Same as with your physical disk when you delete a file the contents doesn't actually get removed.

If don't require 400GB max size, than consider using an appropriately sized volume, for your workload. You can always expand the size of a PVC after the fact.

StefanSa commented 2 years ago

hi @joshimoo @jenting thanks for your explanations, we understood that as long as longhorn doesn't provide trim support or anything like that, we will use our "delete-heavy workloads" with openebs "local pv".

korenlev commented 2 years ago

what is the solution ?

PhanLe1010 commented 2 years ago

We are investigating trimming volume. The effort is tracked at https://github.com/longhorn/longhorn/issues/836