rabbitmq / ra

A Raft implementation for Erlang and Elixir that strives to be efficient and make it easier to use multiple Raft clusters in a single system.
Other
813 stars 96 forks source link

Allow to configure INSTALL_SNAP_RPC_TIMEOUT #257

Closed lauragrechenko closed 2 years ago

lauragrechenko commented 2 years ago

Hi everyone,

we've noticed an issue that when our snapshot is big enough, then when the follower receives the last chunk of the snapshot and the follower tries to install the snapshot and recover from the snapshot. But the leader sender process exits with reason timeout (INSTALL_SNAP_RPC_TIMEOUT). And then the leader starts resending the snapshot again. And out cluster can't work normally anymore.

Could we make this parameter configurable?

We have changes where we save INSTALL_SNAP_RPC_TIMEOUT from Env to the record '#config' (the same way as we can configure 'snapshot_chunk_size') but I'm not sure if it's suitable for you because we'll need to change the 'config' record.

Thanks for your time.

kjnilsson commented 2 years ago

Yes this would be useful to have configurable. Would you like to contribute a PR for it? Something similar to the tick_timeout perhaps.

On Wed, 9 Mar 2022 at 11:42, lauragrechenko @.***> wrote:

Hi everyone,

we've noticed an issue that when our snapshot is big enough, then when the follower receives the last chunk of the snapshot and the follower tries to install the snapshot and recover from the snapshot. But the leader sender process exits with reason timeout (INSTALL_SNAP_RPC_TIMEOUT). And then the leader starts resending the snapshot again. And out cluster can't work normally anymore.

Could we make this parameter configurable?

We have changes where we save INSTALL_SNAP_RPC_TIMEOUT from Env to the record '#config' (the same way as we can configure 'snapshot_chunk_size') but I'm not sure if it's suitable for you.

Thanks for your time.

— Reply to this email directly, view it on GitHub https://github.com/rabbitmq/ra/issues/257, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJAHFF2HEXRVEXDL7CDTU3U7CFBTANCNFSM5QJJANJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Karl Nilsson

lauragrechenko commented 2 years ago

That's good. Then I'll add it the same way as the 'tick_timeout' and open a PR.