SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
I believe we need a way of reporting of the true "on-disk" size of a volume, so we can use this to calculate if it's < maxVolume Size to determine if we need to subtract it from the freeSpace or not.
I see 2 ways we might go about reporting the disk usage size of the volume:
1) Don't set FALLOC_FL_KEEP_SIZE on the fallocate syscall, this causes the result of stat.Size() to return the on-disk size, not necessarily the amount of space used by weed data.
Therefore we would need to be able to determine the usedSpace by weed data another way.
This has the nice side effect of being more obvious of where disk space is being used when looking at the dataFiles outside of seaweedfs.
2) Somehow calculate the on-disk size of the volume and store that (e.g. the equivalent of "du -h 1.dat". My worry here is how expensive that operation will be over of large number of volumes if we have to repeat it.
Maybe it's possible to record this once at volume loading time? Then only update it if datSize gets bigger than it?
Describe the bug
When using
maxVol=0
for auto-max volume detection, along withpreAllocate
, the volume service can incorrectly determine the maxVolumes.In the following example, a simple set-up has been created to reproduce this issue:
Before any volumes exist, it's correctly reporting the maxVol=63:
Then if I create 10 volumes using
I see that my maxVolume has decreased by 10:
Expected behaviour
Max volumes should stay at their original value.
Additional context
This happens because unclaimedSpaces is calculated as the freeSpace minus the unUsed space, which is correct for volumes that are not preAllocated: https://github.com/seaweedfs/seaweedfs/blob/b62f7c512267cfe379100fa283bbe4b0682e5dc9/weed/storage/store.go#L592 We need a solution that can handle a mixture of preAllocated and notAllocated volumes.
I believe we need a way of reporting of the true "on-disk" size of a volume, so we can use this to calculate if it's < maxVolume Size to determine if we need to subtract it from the freeSpace or not.
I see 2 ways we might go about reporting the disk usage size of the volume:
1) Don't set
FALLOC_FL_KEEP_SIZE
on the fallocate syscall, this causes the result ofstat.Size()
to return the on-disk size, not necessarily the amount of space used by weed data. Therefore we would need to be able to determine the usedSpace by weed data another way. This has the nice side effect of being more obvious of where disk space is being used when looking at the dataFiles outside of seaweedfs.2) Somehow calculate the on-disk size of the volume and store that (e.g. the equivalent of "du -h 1.dat". My worry here is how expensive that operation will be over of large number of volumes if we have to repeat it. Maybe it's possible to record this once at volume loading time? Then only update it if
datSize
gets bigger than it?I starting looking at option #1 and quickly realised there are a number of components that rely on knowing the weed data size per volume. Therefore I decided to pause and start this conversation to determine the correct approach before continuing: https://github.com/seaweedfs/seaweedfs/compare/master...danfoster:seaweedfs:maxVol_preAllocate_fallocate