world-federation-of-advertisers / cross-media-measurement

A privacy centric system for cross publisher, cross media ads measurement through secure multiparty computations.
https://halo.wfanet.org/
Apache License 2.0
36 stars 11 forks source link

Multiple mills claim the same Computation at the same time. #1722

Open renjiezh opened 2 months ago

renjiezh commented 2 months ago

Describe the bug There are two mill jobs claiming the same Computation. One of the them is a new spawned by the mill scheduler. The other is a continuing mill job. It caused the later mill job failing the Computation after finishing its stage due to stage mismatch.

Steps to reproduce Run stress test with multiple data services. There is a chance to reproduce.

Component(s) affected Duchy

Version v0.5.7-rc2

Environment QA env

Additional context Happened on worker 1 with global ComputationID: DaTIZfrdJI4

renjiezh commented 2 months ago

It is caused by the spanner implementation of claimTask. The reading(query unclaimed tasks) and writing(claim the task) are not bound in one transaction. Thus there is a chance to lead inconsistency given multiple entities are calling claimTask.

renjiezh commented 2 months ago

PR to fix https://github.com/world-federation-of-advertisers/cross-media-measurement/pull/1726

SanjayVas commented 2 months ago

Fixed by #1726

SanjayVas commented 2 months ago

Reopening this as #1726 may have introduced a lock contention issue.