Depending on the actual use case, object stores can throttle accesses to "hot spots" (partitions), which are "identified" by the first characters of the object key (name/path).
One way around it is to introduce prefixes that distribute objects across multiple object store partitions (see Iceberg's impl for example).
Since the (default) Iceberg way is to construct the object-key as a concatenation of storage-location + hash + (context +) file, the part that distributes the data is placed after a "long-ish string" (namespaces + table-name), possibly eliminating the effect of the hash.
To work around the latter, users set the write.data.path table property to something like s3://bucket/. While this solves the hot-spot issue, it introduces problems for file-based access checks.
We might want to update the file-based access checks in the S3-signer and related code to "ignore" the "randomizer part", simply speaking: instead of doing a "simple" String.startsWith() check in o.p.catalog.service.rest.IcebergS3SignParams#verifyAndSign, we could leverage a regex - but this idea is not fully thought through though.
Depending on the actual use case, object stores can throttle accesses to "hot spots" (partitions), which are "identified" by the first characters of the object key (name/path).
One way around it is to introduce prefixes that distribute objects across multiple object store partitions (see Iceberg's impl for example).
Since the (default) Iceberg way is to construct the object-key as a concatenation of storage-location + hash + (context +) file, the part that distributes the data is placed after a "long-ish string" (namespaces + table-name), possibly eliminating the effect of the hash.
To work around the latter, users set the
write.data.path
table property to something likes3://bucket/
. While this solves the hot-spot issue, it introduces problems for file-based access checks.We might want to update the file-based access checks in the S3-signer and related code to "ignore" the "randomizer part", simply speaking: instead of doing a "simple"
String.startsWith()
check ino.p.catalog.service.rest.IcebergS3SignParams#verifyAndSign
, we could leverage a regex - but this idea is not fully thought through though.