ydb-platform / nbs

Network Block Store
Apache License 2.0
52 stars 21 forks source link

[NBS, Filestore] Decrease min number of channels and support used channel count "shrinkage" for the existing disks #713

Open qkrorlqr opened 6 months ago

qkrorlqr commented 6 months ago

Right now we allocate and use not less than 4 data channels for each partition tablet: https://github.com/ydb-platform/nbs/blob/3d5451f61b1fa9ad902f17a90617052aafc2eef7/cloud/blockstore/libs/storage/core/config.cpp#L313 This value was set a long time ago when we didn't have the autoreassign mechanism and wanted to decrease the chance of consuming all available space on the bsgroups under the data channels of some tablet. So now we have a minimum of 7 channels for each disk:

The same thing is also applicable to filestore.

qkrorlqr commented 4 weeks ago

Significantly increasing allocation unit size would also make sense. Right now an 8TiB filesystem or disk occupies 255 blobstorage groups which span hundreds of nodes. A single group failure will cause failure of most of the large filesystems / disks in a 1000 node cluster. And if we talk about multitablet 100+TiB filesystems, such filesystems are almost guaranteed to become unavailable if a single blobstorage group fails.

Even though blobstorage group is a redundant and reliable entity, downtimes are still possible on rare occasions (due to bugs, misconfiguration, incorrect ops actions).