Closed dimm0 closed 5 years ago
Already have 2 objects
@dimm0 Rook doesn't do anything currently to configure the resharding, but according to the docs resharding is enabled by default. Perhaps you need to add the bucket to the resharding queue with the command from the toolbox?
radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>
Now fighting LARGE_OMAP_OBJECTS 21 large omap objects 21 large objects found in pool 'rooks3.rgw.buckets.index' Search the cluster log for 'Large omap object found' for more details.
radosgw-admin reshard
is done on large buckets, radosgw-admin bucket limit check
is now happy for all buckets
https://www.spinics.net/lists/ceph-users/msg49054.html
Seems like here's the fix. Is it something you could do in rook? Everybody having large buckets will have this problem..
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Still suffering from it. Cluster is never 100% healthy
I see a command to deal with those in 12.2.11 https://ceph.com/releases/v12-2-11-luminous-released/
There have been fixes to RGW dynamic and manual resharding, which no longer
leaves behind stale bucket instances to be removed manually. For finding and
cleaning up older instances from a reshard a radosgw-admin command reshard
stale-instances list and reshard stale-instances rm should do the necessary
cleanup.
Any way to get that version? The rook toolkit pod doesn't seem to have this command
Having 160 such objects now
Fixed in ceph 13.2.5
radosgw-admin reshard stale-instances rm
We also had this same problem and it turned out to be failed multipart uploads. By default there is no lifecycle policy on S3 objects, and if you upload 999GB of a 1TB file and it fails, the multipart ( chunks ) data sticks around forever until you manually clean them up or add a lifecycle policy.
Each multipart chunk is listed inside the omap index files. However, the index does not count them in the sharding logic, so the omap index does not get more shards assuming those chunks are going to get cleaned up, or converted into a completed full object.
Adding lifecycle policies and doing a deep scrub solved the issue.
Is this a bug report or feature request?
Deviation from expected behavior: When creating a large S3 bucket, I see the error in cluster health:
Expected behavior: I expect the dynamic resharding feature to take care of shards.
https://tracker.ceph.com/issues/24457
http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/ is mentioning the "rgw_dynamic_resharding" option set to true by default. Where can I find one?
How to reproduce it (minimal and precise): Create a bucket with a couple millions of files / TBs of space
Environment:
rook version
inside of a Rook Pod): 0.8.2kubectl version
): 1.11.3