near / mpc

37 stars 12 forks source link

feat: stop infinite loop of triple timeout by adding cache for failed triples #480

Closed ppca closed 6 months ago

ppca commented 6 months ago

Purpose While trying out redeploying nodes at different times, I find that in the case when a large number of (>10) triple generation protocols are ongoing, the nodes could go into the infinite loop of: timing out the protocol for a triple id -> receiving message to generate the same triple from another node that has not timed out their protocol -> joining protocol to generate that same triple -> timing out. This happens because each node would join the triple generation protocol at a different time and nodes cannot time out other nodes' protocols. This PR adds a cache for failed triples, so that each node will not retry any failed (including timed out) triples. Then when the other nodes do not receive responses from this node about this triple protocol, they will time it out and thus close the loop for this triple generation instead of infinite looping. They could move on to generating other triples instead of resources being stuck in infinite loops of failed triples.

What's changed in code 1) add failed_triples: triple_id -> timestamp to TripleManager 2) when a triple generation protocol fail, add the triple_id and the timestamp we find out it failed to the failed_triples 3) when we received a message to generate a triple which has ID that is in failed_triples, we skip that message, and update the timestamp of that ID in failed_triples 4) clear_failed_triples() will run at the end of run(), this clears all failed triples that either failed or messaged more than 2 hrs ago

github-actions[bot] commented 6 months ago

Terraform Feature Environment (dev-480)

Terraform Initialization ⚙️success

Terraform Apply success

Show Apply Plan ``` data.external.git_checkout: Reading... data.external.git_checkout: Read complete after 0s [id=-] data.google_compute_network.prod_network: Reading... data.google_compute_subnetwork.prod_subnetwork: Reading... data.google_compute_network.dev_network: Reading... data.google_compute_subnetwork.dev_subnetwork: Reading... google_service_account.service_account: Refreshing state... [id=projects/pagoda-discovery-platform-dev/serviceAccounts/mpc-recovery-dev-480@pagoda-discovery-platform-dev.iam.gserviceaccount.com] module.mpc-leader-lb.google_compute_region_network_endpoint_group.default_neg: Refreshing state... [id=projects/pagoda-discovery-platform-dev/regions/us-east1/networkEndpointGroups/mpc-dev-480-leader-neg] google_secret_manager_secret_iam_member.fast_auth_partners_secret_access: Refreshing state... [id=projects/pagoda-discovery-platform-dev/secrets/mpc-fast-auth-partners-dev/roles/secretmanager.secretAccessor/serviceAccount:mpc-recovery-dev-480@pagoda-discovery-platform-dev.iam.gserviceaccount.com] google_project_iam_member.service-account-datastore-user: Refreshing state... [id=pagoda-discovery-platform-dev/roles/datastore.user/serviceAccount:mpc-recovery-dev-480@pagoda-discovery-platform-dev.iam.gserviceaccount.com] google_service_account_iam_binding.serivce-account-iam: Refreshing state... [id=projects/pagoda-discovery-platform-dev/serviceAccounts/mpc-recovery-dev-480@pagoda-discovery-platform-dev.iam.gserviceaccount.com/roles/iam.serviceAccountUser] google_secret_manager_secret_iam_member.secret_share_secret_access[0]: Refreshing state... [id=projects/pagoda-discovery-platform-dev/secrets/mpc-sk-share-0-dev/roles/secretmanager.secretAccessor/serviceAccount:mpc-recovery-dev-480@pagoda-discovery-platform-dev.iam.gserviceaccount.com] google_secret_manager_secret_iam_member.account_creator_secret_access: Refreshing state... [id=projects/pagoda-discovery-platform-dev/secrets/mpc-recovery-account-creator-sk-dev/roles/secretmanager.secretAccessor/serviceAccount:mpc-recovery-dev-480@pagoda-discovery-platform-dev.iam.gserviceaccount.com] google_secret_manager_secret_iam_member.secret_share_secret_access[1]: Refreshing state... [id=projects/pagoda-discovery-platform-dev/secrets/mpc-sk-share-1-dev/roles/secretmanager.secretAccessor/serviceAccount:mpc-recovery-dev-480@pagoda-discovery-platform-dev.iam.gserviceaccount.com] data.google_compute_network.prod_network: Read complete after 0s [id=projects/pagoda-shared-infrastructure/global/networks/prod] google_secret_manager_secret_iam_member.secret_share_secret_access[2]: Refreshing state... [id=projects/pagoda-discovery-platform-dev/secrets/mpc-sk-share-2-dev/roles/secretmanager.secretAccessor/serviceAccount:mpc-recovery-dev-480@pagoda-discovery-platform-dev.iam.gserviceaccount.com] data.google_compute_network.dev_network: Read complete after 0s [id=projects/pagoda-shared-infrastructure/global/networks/dev] google_secret_manager_secret_iam_member.cipher_key_secret_access[0]: Refreshing state... [id=projects/pagoda-discovery-platform-dev/secrets/mpc-cipher-0-dev/roles/secretmanager.secretAccessor/serviceAccount:mpc-recovery-dev-480@pagoda-discovery-platform-dev.iam.gserviceaccount.com] google_secret_manager_secret_iam_member.cipher_key_secret_access[1]: Refreshing state... [id=projects/pagoda-discovery-platform-dev/secrets/mpc-cipher-1-dev/roles/secretmanager.secretAccessor/serviceAccount:mpc-recovery-dev-480@pagoda-discovery-platform-dev.iam.gserviceaccount.com] google_secret_manager_secret_iam_member.cipher_key_secret_access[2]: Refreshing state... [id=projects/pagoda-discovery-platform-dev/secrets/mpc-cipher-2-dev/roles/secretmanager.secretAccessor/serviceAccount:mpc-recovery-dev-480@pagoda-discovery-platform-dev.iam.gserviceaccount.com] module.mpc-leader-lb.google_compute_region_backend_service.default: Refreshing state... [id=projects/pagoda-discovery-platform-dev/regions/us-east1/backendServices/mpc-dev-480-leader-backend-service] data.google_compute_subnetwork.prod_subnetwork: Read complete after 1s [id=projects/pagoda-shared-infrastructure/regions/us-east1/subnetworks/cloudrun-main-prod-us-east1] data.google_compute_subnetwork.dev_subnetwork: Read complete after 1s [id=projects/pagoda-shared-infrastructure/regions/us-east1/subnetworks/cloudrun-main-dev-us-east1] module.mpc-leader-lb.google_compute_region_url_map.default: Refreshing state... [id=projects/pagoda-discovery-platform-dev/regions/us-east1/urlMaps/mpc-dev-480-leader-url-map] module.signer[2].google_cloud_run_v2_service.signer: Refreshing state... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-2-dev-480] module.signer[0].google_cloud_run_v2_service.signer: Refreshing state... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-0-dev-480] module.signer[1].google_cloud_run_v2_service.signer: Refreshing state... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-1-dev-480] module.mpc-leader-lb.google_compute_region_target_http_proxy.default: Refreshing state... [id=projects/pagoda-discovery-platform-dev/regions/us-east1/targetHttpProxies/mpc-dev-480-leader-http-proxy] module.mpc-leader-lb.google_compute_forwarding_rule.default: Refreshing state... [id=projects/pagoda-discovery-platform-dev/regions/us-east1/forwardingRules/mpc-dev-480-leader-forwarding-rule] module.signer[2].google_cloud_run_v2_service_iam_member.allow_all: Refreshing state... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-2-dev-480/roles/run.invoker/allUsers] module.signer[1].google_cloud_run_v2_service_iam_member.allow_all: Refreshing state... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-1-dev-480/roles/run.invoker/allUsers] module.signer[0].google_cloud_run_v2_service_iam_member.allow_all: Refreshing state... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-0-dev-480/roles/run.invoker/allUsers] module.leader.google_cloud_run_v2_service.leader: Refreshing state... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-leader-dev-480] module.leader.google_cloud_run_v2_service_iam_member.allow_all: Refreshing state... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-leader-dev-480/roles/run.invoker/allUsers] Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: ~ update in-place Terraform will perform the following actions: # module.leader.google_cloud_run_v2_service.leader will be updated in-place ~ resource "google_cloud_run_v2_service" "leader" { id = "projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-leader-dev-480" name = "mpc-recovery-leader-dev-480" # (17 unchanged attributes hidden) ~ template { # (6 unchanged attributes hidden) ~ containers { ~ image = "us-east1-docker.pkg.dev/pagoda-discovery-platform-dev/mpc-recovery/mpc-recovery-dev:6b38d453ef2e954824ae5b56f860dbf270b53344" -> "us-east1-docker.pkg.dev/pagoda-discovery-platform-dev/mpc-recovery/mpc-recovery-dev:931554f364fed05b18d2a4ee891e907f9162e305" # (2 unchanged attributes hidden) # (16 unchanged blocks hidden) } # (2 unchanged blocks hidden) } # (1 unchanged block hidden) } # module.signer[0].google_cloud_run_v2_service.signer will be updated in-place ~ resource "google_cloud_run_v2_service" "signer" { id = "projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-0-dev-480" name = "mpc-recovery-signer-0-dev-480" # (17 unchanged attributes hidden) ~ template { # (6 unchanged attributes hidden) ~ containers { ~ image = "us-east1-docker.pkg.dev/pagoda-discovery-platform-dev/mpc-recovery/mpc-recovery-dev:6b38d453ef2e954824ae5b56f860dbf270b53344" -> "us-east1-docker.pkg.dev/pagoda-discovery-platform-dev/mpc-recovery/mpc-recovery-dev:931554f364fed05b18d2a4ee891e907f9162e305" # (2 unchanged attributes hidden) # (11 unchanged blocks hidden) } # (2 unchanged blocks hidden) } # (1 unchanged block hidden) } # module.signer[1].google_cloud_run_v2_service.signer will be updated in-place ~ resource "google_cloud_run_v2_service" "signer" { id = "projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-1-dev-480" name = "mpc-recovery-signer-1-dev-480" # (17 unchanged attributes hidden) ~ template { # (6 unchanged attributes hidden) ~ containers { ~ image = "us-east1-docker.pkg.dev/pagoda-discovery-platform-dev/mpc-recovery/mpc-recovery-dev:6b38d453ef2e954824ae5b56f860dbf270b53344" -> "us-east1-docker.pkg.dev/pagoda-discovery-platform-dev/mpc-recovery/mpc-recovery-dev:931554f364fed05b18d2a4ee891e907f9162e305" # (2 unchanged attributes hidden) # (11 unchanged blocks hidden) } # (2 unchanged blocks hidden) } # (1 unchanged block hidden) } # module.signer[2].google_cloud_run_v2_service.signer will be updated in-place ~ resource "google_cloud_run_v2_service" "signer" { id = "projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-2-dev-480" name = "mpc-recovery-signer-2-dev-480" # (17 unchanged attributes hidden) ~ template { # (6 unchanged attributes hidden) ~ containers { ~ image = "us-east1-docker.pkg.dev/pagoda-discovery-platform-dev/mpc-recovery/mpc-recovery-dev:6b38d453ef2e954824ae5b56f860dbf270b53344" -> "us-east1-docker.pkg.dev/pagoda-discovery-platform-dev/mpc-recovery/mpc-recovery-dev:931554f364fed05b18d2a4ee891e907f9162e305" # (2 unchanged attributes hidden) # (11 unchanged blocks hidden) } # (2 unchanged blocks hidden) } # (1 unchanged block hidden) } Plan: 0 to add, 4 to change, 0 to destroy. module.signer[1].google_cloud_run_v2_service.signer: Modifying... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-1-dev-480] module.signer[0].google_cloud_run_v2_service.signer: Modifying... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-0-dev-480] module.signer[2].google_cloud_run_v2_service.signer: Modifying... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-2-dev-480] module.signer[0].google_cloud_run_v2_service.signer: Still modifying... [id=projects/pagoda-discovery-platform-dev/...services/mpc-recovery-signer-0-dev-480, 10s elapsed] module.signer[1].google_cloud_run_v2_service.signer: Still modifying... [id=projects/pagoda-discovery-platform-dev/...services/mpc-recovery-signer-1-dev-480, 10s elapsed] module.signer[2].google_cloud_run_v2_service.signer: Still modifying... [id=projects/pagoda-discovery-platform-dev/...services/mpc-recovery-signer-2-dev-480, 10s elapsed] module.signer[0].google_cloud_run_v2_service.signer: Still modifying... [id=projects/pagoda-discovery-platform-dev/...services/mpc-recovery-signer-0-dev-480, 20s elapsed] module.signer[1].google_cloud_run_v2_service.signer: Still modifying... [id=projects/pagoda-discovery-platform-dev/...services/mpc-recovery-signer-1-dev-480, 20s elapsed] module.signer[2].google_cloud_run_v2_service.signer: Still modifying... [id=projects/pagoda-discovery-platform-dev/...services/mpc-recovery-signer-2-dev-480, 20s elapsed] module.signer[2].google_cloud_run_v2_service.signer: Modifications complete after 22s [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-2-dev-480] module.signer[0].google_cloud_run_v2_service.signer: Modifications complete after 22s [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-0-dev-480] module.signer[1].google_cloud_run_v2_service.signer: Modifications complete after 22s [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-signer-1-dev-480] module.leader.google_cloud_run_v2_service.leader: Modifying... [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-leader-dev-480] module.leader.google_cloud_run_v2_service.leader: Still modifying... [id=projects/pagoda-discovery-platform-dev/...1/services/mpc-recovery-leader-dev-480, 10s elapsed] module.leader.google_cloud_run_v2_service.leader: Still modifying... [id=projects/pagoda-discovery-platform-dev/...1/services/mpc-recovery-leader-dev-480, 20s elapsed] module.leader.google_cloud_run_v2_service.leader: Modifications complete after 21s [id=projects/pagoda-discovery-platform-dev/locations/us-east1/services/mpc-recovery-leader-dev-480] Apply complete! Resources: 0 added, 4 changed, 0 destroyed. Outputs: leader_node = "https://mpc-recovery-leader-dev-480-7tk2cmmtcq-ue.a.run.app" ```

Pusher: @ppca, Action: pull_request, Working Directory: `, Workflow:Terraform Feature Env`

URL: https://mpc-recovery-leader-dev-480-7tk2cmmtcq-ue.a.run.app