quantumlib / Cirq

A Python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.
Apache License 2.0
4.28k stars 1.02k forks source link

StreamManager: retry with get result request on already exist errors #6345

Closed verult closed 1 year ago

verult commented 1 year ago

This PR fixes a race condition that occurred roughly every 10-15min by adding a retry with GetQuantumResultRequest when StreamManager receives a program or job already exists error. The sequence is as follows:

  1. The client sends a CreateProgramAndJobRequest
  2. The client's stream disconnects
  3. The client retries with a new stream and a GetResultRequest
  4. The job doesn't exist yet, and the client receives a "job not found" error
  5. Scheduler creates the program and job.
  6. The client retries with a CreateJobRequest and fails with a "job already exists" error

This would cause issues when a user specifies a program ID or job ID in Engine.run_sweep() or EngineProcessor.run_sweep() rather than letting the client generate the ID, because there could be a real ID conflict. However, the recommended path of using ProcessorSampler.run_sweep() does not specify IDs, and we're considering deprecating this ability to specify IDs. It's otherwise hard to discern between a real conflict vs. the race condition.

This is now the error handling logic after a stream breakage:

stateDiagram-v2
    [*] --> GetResult
    CreateJob --> GetResult: J
    GetResult --> CreateJob: !J
    CreateJob --> CreateProgramAndJob: !P
    CreateProgramAndJob --> GetResult: P
    CreateProgramAndJob --> GetResult: J

where

and the dot indicates the starting state.

cc @senecameeks

codecov[bot] commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (0e288a7) 97.84% compared to head (a29fcc1) 97.84%.

:exclamation: Current head a29fcc1 differs from pull request most recent head b60d41f. Consider uploading reports for the commit b60d41f to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #6345 +/- ## ======================================= Coverage 97.84% 97.84% ======================================= Files 1110 1110 Lines 96597 96648 +51 ======================================= + Hits 94516 94567 +51 Misses 2081 2081 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.