[discussion] metriq-gym architecture

Wanted to start a bit of discussion about what the underlying architecture of metriq-gym should be. This will help us as we scale out (1) adding new benchmarks and (2) adding new backends. We'll want to abstract over both of those things while keeping in mind that lots of benchmark execution on hardware is going to be async.

Some ideas to kick start discussion. This is a bit stream of consciousness so likely can be much improved by further discussion.

Goal: metriq-gym is designed to reliably run several benchmark workloads on several quantum computing backends. Each benchmark workload returns output data as well as a score:float.

Currently these objects are in the package. I’ll list my understanding of their intention.

BenchProvider is a cloud service that can be used to run benchmark workloads.
BenchJobType is the type of different benchmark workloads (e.g. QV:BenchJobType, CLOPS:BenchJobType, etc.).
BenchJobResult is a type that stores a lot of extra data from a benchmark run, include job_ids and other metadata.

Abstracting async services

Doing things async means that the BenchProvider should be an Object that helps us retrieve data from that cloud platform with some standard interface, e.g.

# pseudocode
bp = IBMQ:BenchProvider
bp.push(job) # push a job to the cloud
bp.poll(job) # check T/F if a job is done 
bp.get(job) # pull the data:BenchJobResult from a completed job if complete

or something like that. This way BenchProviders abstract the async service part of things.

We also need a few function types that abstract over the actual execution of quantum programs. A few to keep in mind:

An executor is a function of type:: OpenQASM → float
A sampler is a function of type:: (OpenQASM, shots:int) → [bitstring], i.e. it returns a list of bitstrings of length shots.
A counter is a function of type:: (OpenQASM, shots:int) → [dict{bitstring, int}], i.e. it returns a dictionary where keys are bitstrings and values are a frequency count.

Here I have chosen OpenQASM as the standard format. Of course open to other ways of doing it, but since it seems most everyone can convert into and out of OpenQASM perhaps this is the best format.

Actual Benchmarks

It would seem to make sense that each object of BenchJobType corresponds to a benchmark workload and can produce a score when given an executor, sampler, or counter. Which of those is appropriate I'd imagine depends on the benchmark. Am imagining something like this:

class CLOPS(BenchJobType):
     def __init__(self, params):
          # init with relevant params

     def run(executor:executor, bp:BenchProvider):
          # do the benchmark, submitted using bp.push()
          return job # this is some job object from the bp that can be used to retrieve data async

     def score(job, bp:BenchProvider):
         # pull the data from async using bp.get() and calculate the score for this benchmark
     return (score, BenchJobResult)

Right now the clops_benchmark object is linked directly to Qiskit. The idea is that CLOPS (and other BenchJobType objects) would be backend independent, requiring only either an executor or sampler to be passed in.

The goal behind implementing this is to make extension easy.

Extending to other backends

If you are adding a new quantum computer stack then you just need to specify executor, sampler, or counter functions. If you are adding a new cloud service then you need to make a new BenchProvider object. Switching BenchProvider doesn't necessarily mean you need to switch executor.

Extending to other benchmarks

To add a new benchmark you need only to define a new BenchJobType that follows the abstract schema having a .run() and .score() function of the right type.

Running over backends and benchjobs

Ideally we then can have

backends = [
     (ibm_armonk_executor:OpenQASM → float, ibmq:BenchProvider),
     (ibm_tenerife_executor:OpenQASM → float, ibmq:BenchProvider),
     (rigetti_acorn_executor:OpenQASM → float, aws:BenchProvider),
     (rigetti_acorn_executor:OpenQASM → float, azure:BenchProvider),
     (iqm_garnett_executor:OpenQASM → float, aws:BenchProvider),
]
jobs = []
# push jobs
for executor, bp in backends:
     for benchmark in benchmarks:[BenchJobType]:
           job_data = benchmark.run(executor, bp)
           jobs.append([executor, bp, job_data])
# pull jobs
scores = []
for _, bp, job in jobs:
     for benchmark in benchmarks:[BenchJobType]:
           score, results = benchmark.score(job, bp)
           scores.append([score, results])

Then the scores list can be parsed to output the results in different ways.

Thoughts? What can be improved/added here? Are there other directions you had in mind to do these abstraction layers? @cosenal @vprusso @WrathfulSpatula

unitaryfund / metriq-gym