minio / mc

Unix like utilities for object store
https://min.io/download
GNU Affero General Public License v3.0
2.86k stars 548 forks source link

Cannot cancel and restart site replication resynchonization. #4996

Closed itssimple closed 2 months ago

itssimple commented 3 months ago

Expected behavior

Resync gets cancelled on all nodes and sites, and I can restart it with bandwith limits.

Actual behavior

Says it is cancelled successfully, but then get blocked by:

mcli admin replicate resync cancel remote local
Site resync with ID f325e1dc-15ff-4008-ace8-e5528e980941 canceled successfully.
mcli admin replicate --limit-upload 10M --limit-download 10M resync start remote local
mcli: <ERROR> Unable to start replication resync. Invalid site-replication request (site replication resync is already in progress).

Steps to reproduce the behavior

This cluster is in production, so I can't take it down to reproduce it.

But more or less, it all started with: mcli admin replicate resync start remote local

And since we didn't limit the bandwidth, customers noticed slowdowns and halts, so we had to cancel the resync (the remote contained all files we needed over at local, so we setup site replication and started a resync), so that our customers could continue their work, while we issued a new resync with bandwidth limitations, which then didn't work, because it complained that there was already a resync ongoing.

mc --version

mcli version RELEASE.2024-05-24T09-08-49Z (commit-id=a8fdcbe7cb2f85ce98d60e904717aa00016a7d37)
Runtime: go1.22.3 linux/amd64
Copyright (c) 2015-2024 MinIO, Inc.
License GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
itssimple commented 2 months ago

Resolved by restarting both clusters, couldn't cancel otherwise.