qiboteam / qibolab

Quantum hardware module and drivers for Qibo.
https://qibo.science
Apache License 2.0
41 stars 10 forks source link

BatchingMode #817

Closed Jacfomg closed 4 months ago

Jacfomg commented 5 months ago

Implement another batching check for the unrolling and a way of selecting which for qibocal for the time being. Following #815

I will test on the devices once they are on. Also, I would like to know with other limitations we may add, like batch_max_whatever and to make a general one we make it pass through all of them.

Checklist:

alecandido commented 5 months ago

Personally, my goal would be to have a single batching algorithm, with many tunable parameters (in the best scenario, the parameters would be the amount of memory space per type of memory), and then to fit them on the instruments (like @stavros11 was doing with the number of sequences and readouts).

Since they only depend on the instruments' designs, there should be no need to tune it for each specific execution, so (eventually) we could even remove from ExecutionParameters.

Which is the best course of action for the intermediate stage, I'm not sure about. @stavros11?

alecandido commented 5 months ago

Concerning which are the limitations, if I got it correctly, they should be:

  1. waveform memory
    • this is used to store the waveforms, and a Rectangular would fit in a single fixed precision number, while everything else requires one number per sample
  2. readout memory
    • to store results before downloading
  3. instruction memory
    • this I got from @hay-k, since I believe we're very far from hitting that
    • essentially, if you play tons of identical pulses, you do not need to load the waveforms over and over, but you need to give the full program to the sequencers anyhow - so you could increase the amount of instructions keeping constant the waveform memory (it should happen only for deep circuits, hard in other kinds of calibration routines)

Your algorithm and the max_sequences are only a simplified proxy for 1., and counting the readouts is a simplified proxy for 2. (the correct one should take at least into account the AcquisitionType).

The whole trivial algorithm essentially would consist in appending sequences one by one, until you exceed one of the three types of memory, at which point you terminate the batch. (most likely you can optimize over the trivial loop, but it should even be a minimal overhead on top of execution)

alecandido commented 5 months ago

@hay-k, that you know, is there any limitation on unrolling that is not memory-related?

I can't imagine, because limiting the transactions is always convenient (the less you do, the faster you are), and the only reason for which the boards communicate is to exchange data, that could fit on memory (for independent execution - of course, if you require some other processing step, like for variational, you certainly need to download, at least locally on the board CPU)

stavros11 commented 5 months ago

Thanks for these efforts @Jacfomg @alecandido.

Since they only depend on the instruments' designs, there should be no need to tune it for each specific execution, so (eventually) we could even remove from ExecutionParameters.

That is also my general impression. However, it may be possible to achieve better performance by setting a different batching option per routine, because at that point we have information of how the sequences look (ie. if they have same length). This is an optimization over the standard unrolling with a default batching. To me it seems the same as sweeper: they can also be replaced by a default unrolling, but when using the sweeper you take advantage of the specific structure of the sequences to make it faster.

Which is the best course of action for the intermediate stage, I'm not sure about. @stavros11?

What is implemented here is acceptable for this, my main concerns are:

  1. I am not sure I understand the goal: it is certainly an optimization over the unrolling we already have, but as I wrote in #815 I do not think it is the best optimization we can do, at least for all instruments. However this is just speculation, we need actual benchmarks to corroborate it.
  2. We are starting to provide different kinds of optimizations for real time operations (unrolling, unrolling with user selected batching, sweepers) and every instrument supports a different subset of those. It will be pretty hard for users (even ourselves) to know what to use in each case. From that point of view, I would prioritize 0.2 so that we can have a more robust and as much as "instrument agnostic" interface as possible, before moving to optimizations.

We should keep in mind that the optimal way to use each instrument is its native language. If this is easier than using through qibolab, then the project looses its purpose.

  • essentially, if you play tons of identical pulses, you do not need to load the waveforms over and over, but you need to give the full program to the sequencers anyhow - so you could increase the amount of instructions keeping constant the waveform memory (it should happen only for deep circuits, hard in other kinds of calibration routines)

This point may actually be relevant at least for QM, but I need to investigate further. This was basically the motivation for https://github.com/qibogang/qm-benchmarks/pull/1. Even if you upload the waveform once, using with for_ loops / if_ statements seems better than writing an unrolled program with repeated instructions (in summary sweeper >> unrolling).

Jacfomg commented 5 months ago

I would take a look into the last commit to introduce what I changed but the it seems working. I couldn't run a flipping experiment with unrolling before as in order to run the MAX_SEQUENCES needed to be set to 1 or 2. That made an ideal time of 31s take 67s. By checking the control_pulse_sequence_duration using these changes I could take it to 40s.

codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 65.54%. Comparing base (1c8650b) to head (c9b5948). Report is 24 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #817 +/- ## ========================================== + Coverage 65.10% 65.54% +0.43% ========================================== Files 50 50 Lines 6027 6071 +44 ========================================== + Hits 3924 3979 +55 + Misses 2103 2092 -11 ``` | [Flag](https://app.codecov.io/gh/qiboteam/qibolab/pull/817/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/qiboteam/qibolab/pull/817/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam) | `65.54% <100.00%> (+0.43%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jacfomg commented 5 months ago

Since they only depend on the instruments' designs, there should be no need to tune it for each specific execution, so (eventually) we could even remove from ExecutionParameters.

That is also my general impression. However, it may be possible to achieve better performance by setting a different batching option per routine, because at that point we have information of how the sequences look (ie. if they have same length). This is an optimization over the standard unrolling with a default batching. To me it seems the same as sweeper: they can also be replaced by a default unrolling, but when using the sweeper you take advantage of the specific structure of the sequences to make it faster.

We could leave that optimization for later, for now this gets what I wanted that is a working unrolling for experiments with growing sequences.

  1. I am not sure I understand the goal: it is certainly an optimization over the unrolling we already have, but as I wrote in Unrolling with flipping need another batching algorithm #815 I do not think it is the best optimization we can do, at least for all instruments. However this is just speculation, we need actual benchmarks to corroborate it.

It is not the best but is far better than just playing the sequences one by one. As you pointed out the best would be to use directly their languages.

  1. We are starting to provide different kinds of optimizations for real time operations (unrolling, unrolling with user selected batching, sweepers) and every instrument supports a different subset of those. It will be pretty hard for users (even ourselves) to know what to use in each case. From that point of view, I would prioritize 0.2 so that we can have a more robust and as much as "instrument agnostic" interface as possible, before moving to optimizations.

Sure, we can leave the Batching_Mode for later.

alecandido commented 5 months ago

(in summary sweeper >> unrolling).

I believe this will always be the case, apart for special cases. But long circuits are one of those special situations, in which sweepers are of no use, and unrolling is definitely valuable.

alecandido commented 5 months ago

It is not the best but is far better than just playing the sequences one by one. As you pointed out the best would be to use directly their languages.

In some situations this will be unavoidable, but there are cases in which we can map Qibolab abstractions one-to-one to the most optimal QUA implementation (or the like).

If this was always the case, the instruments would be 100% equivalent (apart for the API and memory bounds). I expect that this won't happen, but let's not assume it a priori.

Jacfomg commented 5 months ago

In some situations this will be unavoidable, but there are cases in which we can map Qibolab abstractions one-to-one to the most optimal QUA implementation (or the like).

That is true, but while we work on having them this could work.

alecandido commented 5 months ago

That is true, but while we work on having them this could work.

I hope this will work even after :) (or something similar, we will iterate anyhow)

Jacfomg commented 5 months ago

@alecandido I think something was not behaving as expected since https://github.com/qiboteam/qibolab/pull/817/commits/aff3f24ef5e8841f180d63605413d3eec65fed7b. I am going to revert for the time being.

alecandido commented 5 months ago

@alecandido I think something was not behaving as expected since aff3f24. I am going to revert for the time being.

@Jacfomg would you tell me the error you're getting?

Jacfomg commented 5 months ago

@alecandido I think something was not behaving as expected since aff3f24. I am going to revert for the time being.

@Jacfomg would you tell me the error you're getting?

I was playing with some simple sequences with a single readout setting the max_readouts to 2 expecting them to be splitted in groups of 2, but it wasn't behaving like that.

Jacfomg commented 5 months ago

I was testing for RB so I was running this to reproduce

from qibo.backends import GlobalBackend
from qibocal.protocols.characterization.randomized_benchmarking.standard_rb import random_circuits
from qibolab import create_platform

platform = create_platform("dummy")

GlobalBackend.set_backend("qibolab", platform)
backend = GlobalBackend()

from qibocal.protocols.characterization.randomized_benchmarking.standard_rb import random_circuits

circuits = random_circuits(2, [0], 5, 1234) # This generates 5 sequences

native_circuits, _ = zip(*(backend.transpile(circuit) for circuit in circuits))
sequences, measurement_maps = zip(
    *(
        backend.compiler.compile(circuit, backend.platform)
        for circuit in native_circuits
    )
)

from qibolab.unrolling import batch, Bounds

bounds  = Bounds(1000, 2, 1000)
batches = batch(sequences, bounds)

len(list(batch(sequences, bounds))) # Should be 3
alecandido commented 5 months ago

I was testing for RB so I was running this to reproduce

Which qibocal branch?

Traceback (most recent call last):
  File "/Users/alessandro/Projects/Quantum/qibolab/test.py", line 2, in <module>
    from qibocal.protocols.characterization.randomized_benchmarking.standard_rb import (
ImportError: cannot import name 'random_circuits' from 'qibocal.protocols.characterization.randomized_benchmarking.standard_rb' (/Users/alessandro/Projects/Quantum/qibolab/.venv/lib/python3.11/site-packages/qibocal/protocols/characterization/randomized_benchmarking/standard_rb.py)
alecandido commented 5 months ago

Which qibocal branch?

I figured out, it was qibocal@save_circs (just tried until your script was working)

alecandido commented 5 months ago

According to your script, the problem was just the variable name (with your full fix the final len() was 3, before it was 1, and with just the loop variable renaming it was 3 again).

If you still observe further issues let me know!

Jacfomg commented 5 months ago

I was testing for RB so I was running this to reproduce

Which qibocal branch?

Traceback (most recent call last):
  File "/Users/alessandro/Projects/Quantum/qibolab/test.py", line 2, in <module>
    from qibocal.protocols.characterization.randomized_benchmarking.standard_rb import (
ImportError: cannot import name 'random_circuits' from 'qibocal.protocols.characterization.randomized_benchmarking.standard_rb' (/Users/alessandro/Projects/Quantum/qibolab/.venv/lib/python3.11/site-packages/qibocal/protocols/characterization/randomized_benchmarking/standard_rb.py)

Ah sorry, it should work with https://github.com/qiboteam/qibocal/pull/597, but feel free to define any PulseSequence with at least 1 measurament. This some testing code generating PulseSequences that I reused.

Jacfomg commented 5 months ago

According to your script, the problem was just the variable name (with your full fix the final len() was 3, before it was 1, and with just the loop variable renaming it was 3 again).

If you still observe further issues let me know!

I see, then it was my bad, I forgot to reconnect to the kernel after the change.

alecandido commented 5 months ago

If it's solved, so much the better!

If you could test we have reasonable performances on hardware (as you needed in the first place), then we could just check the coverage and aim to merge.

alecandido commented 5 months ago

@Jacfomg it seems like tests are passing. Would you try to write some more unit tests for the unrolling module?

(testing features in isolation, they only depend on having a PulseSequence as an input, and only the Bounds.update part)

Jacfomg commented 5 months ago

Do we need more complex tests or should this be enough ?

alecandido commented 4 months ago

@stavros11 @hay-k any further opinion on this? change requests?

Otherwise I'd just move forward and merge, we can always iterate and improve.

alecandido commented 4 months ago

Anyway, I’ll just commit the fix (since it’s trivial) and I guess we can merge this after since it is tested on QM and Zurich.

Since the tests are passing, I'm merging.

I would actually like to merge #832 asap. Could you review that @Jacfomg?