Closed Jacfomg closed 4 months ago
Personally, my goal would be to have a single batching algorithm, with many tunable parameters (in the best scenario, the parameters would be the amount of memory space per type of memory), and then to fit them on the instruments (like @stavros11 was doing with the number of sequences and readouts).
Since they only depend on the instruments' designs, there should be no need to tune it for each specific execution, so (eventually) we could even remove from ExecutionParameters
.
Which is the best course of action for the intermediate stage, I'm not sure about. @stavros11?
Concerning which are the limitations, if I got it correctly, they should be:
Rectangular
would fit in a single fixed precision number, while everything else requires one number per sampleYour algorithm and the max_sequences
are only a simplified proxy for 1., and counting the readouts is a simplified proxy for 2. (the correct one should take at least into account the AcquisitionType
).
The whole trivial algorithm essentially would consist in appending sequences one by one, until you exceed one of the three types of memory, at which point you terminate the batch. (most likely you can optimize over the trivial loop, but it should even be a minimal overhead on top of execution)
@hay-k, that you know, is there any limitation on unrolling that is not memory-related?
I can't imagine, because limiting the transactions is always convenient (the less you do, the faster you are), and the only reason for which the boards communicate is to exchange data, that could fit on memory (for independent execution - of course, if you require some other processing step, like for variational, you certainly need to download, at least locally on the board CPU)
Thanks for these efforts @Jacfomg @alecandido.
Since they only depend on the instruments' designs, there should be no need to tune it for each specific execution, so (eventually) we could even remove from
ExecutionParameters
.
That is also my general impression. However, it may be possible to achieve better performance by setting a different batching option per routine, because at that point we have information of how the sequences look (ie. if they have same length). This is an optimization over the standard unrolling with a default batching. To me it seems the same as sweeper: they can also be replaced by a default unrolling, but when using the sweeper you take advantage of the specific structure of the sequences to make it faster.
Which is the best course of action for the intermediate stage, I'm not sure about. @stavros11?
What is implemented here is acceptable for this, my main concerns are:
We should keep in mind that the optimal way to use each instrument is its native language. If this is easier than using through qibolab, then the project looses its purpose.
- essentially, if you play tons of identical pulses, you do not need to load the waveforms over and over, but you need to give the full program to the sequencers anyhow - so you could increase the amount of instructions keeping constant the waveform memory (it should happen only for deep circuits, hard in other kinds of calibration routines)
This point may actually be relevant at least for QM, but I need to investigate further. This was basically the motivation for https://github.com/qibogang/qm-benchmarks/pull/1. Even if you upload the waveform once, using with for_
loops / if_
statements seems better than writing an unrolled program with repeated instructions (in summary sweeper >> unrolling).
I would take a look into the last commit to introduce what I changed but the it seems working. I couldn't run a flipping experiment with unrolling before as in order to run the MAX_SEQUENCES
needed to be set to 1 or 2. That made an ideal time of 31s take 67s. By checking the control_pulse_sequence_duration
using these changes I could take it to 40s.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 65.54%. Comparing base (
1c8650b
) to head (c9b5948
). Report is 24 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Since they only depend on the instruments' designs, there should be no need to tune it for each specific execution, so (eventually) we could even remove from
ExecutionParameters
.That is also my general impression. However, it may be possible to achieve better performance by setting a different batching option per routine, because at that point we have information of how the sequences look (ie. if they have same length). This is an optimization over the standard unrolling with a default batching. To me it seems the same as sweeper: they can also be replaced by a default unrolling, but when using the sweeper you take advantage of the specific structure of the sequences to make it faster.
We could leave that optimization for later, for now this gets what I wanted that is a working unrolling for experiments with growing sequences.
- I am not sure I understand the goal: it is certainly an optimization over the unrolling we already have, but as I wrote in Unrolling with flipping need another batching algorithm #815 I do not think it is the best optimization we can do, at least for all instruments. However this is just speculation, we need actual benchmarks to corroborate it.
It is not the best but is far better than just playing the sequences one by one. As you pointed out the best would be to use directly their languages.
- We are starting to provide different kinds of optimizations for real time operations (unrolling, unrolling with user selected batching, sweepers) and every instrument supports a different subset of those. It will be pretty hard for users (even ourselves) to know what to use in each case. From that point of view, I would prioritize 0.2 so that we can have a more robust and as much as "instrument agnostic" interface as possible, before moving to optimizations.
Sure, we can leave the Batching_Mode
for later.
(in summary sweeper >> unrolling).
I believe this will always be the case, apart for special cases. But long circuits are one of those special situations, in which sweepers are of no use, and unrolling is definitely valuable.
It is not the best but is far better than just playing the sequences one by one. As you pointed out the best would be to use directly their languages.
In some situations this will be unavoidable, but there are cases in which we can map Qibolab abstractions one-to-one to the most optimal QUA implementation (or the like).
If this was always the case, the instruments would be 100% equivalent (apart for the API and memory bounds). I expect that this won't happen, but let's not assume it a priori.
In some situations this will be unavoidable, but there are cases in which we can map Qibolab abstractions one-to-one to the most optimal QUA implementation (or the like).
That is true, but while we work on having them this could work.
That is true, but while we work on having them this could work.
I hope this will work even after :) (or something similar, we will iterate anyhow)
@alecandido I think something was not behaving as expected since https://github.com/qiboteam/qibolab/pull/817/commits/aff3f24ef5e8841f180d63605413d3eec65fed7b. I am going to revert for the time being.
@alecandido I think something was not behaving as expected since aff3f24. I am going to revert for the time being.
@Jacfomg would you tell me the error you're getting?
@alecandido I think something was not behaving as expected since aff3f24. I am going to revert for the time being.
@Jacfomg would you tell me the error you're getting?
I was playing with some simple sequences with a single readout setting the max_readouts
to 2 expecting them to be splitted in groups of 2, but it wasn't behaving like that.
I was testing for RB so I was running this to reproduce
from qibo.backends import GlobalBackend
from qibocal.protocols.characterization.randomized_benchmarking.standard_rb import random_circuits
from qibolab import create_platform
platform = create_platform("dummy")
GlobalBackend.set_backend("qibolab", platform)
backend = GlobalBackend()
from qibocal.protocols.characterization.randomized_benchmarking.standard_rb import random_circuits
circuits = random_circuits(2, [0], 5, 1234) # This generates 5 sequences
native_circuits, _ = zip(*(backend.transpile(circuit) for circuit in circuits))
sequences, measurement_maps = zip(
*(
backend.compiler.compile(circuit, backend.platform)
for circuit in native_circuits
)
)
from qibolab.unrolling import batch, Bounds
bounds = Bounds(1000, 2, 1000)
batches = batch(sequences, bounds)
len(list(batch(sequences, bounds))) # Should be 3
I was testing for RB so I was running this to reproduce
Which qibocal
branch?
Traceback (most recent call last):
File "/Users/alessandro/Projects/Quantum/qibolab/test.py", line 2, in <module>
from qibocal.protocols.characterization.randomized_benchmarking.standard_rb import (
ImportError: cannot import name 'random_circuits' from 'qibocal.protocols.characterization.randomized_benchmarking.standard_rb' (/Users/alessandro/Projects/Quantum/qibolab/.venv/lib/python3.11/site-packages/qibocal/protocols/characterization/randomized_benchmarking/standard_rb.py)
Which
qibocal
branch?
I figured out, it was qibocal@save_circs
(just tried until your script was working)
According to your script, the problem was just the variable name (with your full fix the final len()
was 3, before it was 1, and with just the loop variable renaming it was 3 again).
If you still observe further issues let me know!
I was testing for RB so I was running this to reproduce
Which
qibocal
branch?Traceback (most recent call last): File "/Users/alessandro/Projects/Quantum/qibolab/test.py", line 2, in <module> from qibocal.protocols.characterization.randomized_benchmarking.standard_rb import ( ImportError: cannot import name 'random_circuits' from 'qibocal.protocols.characterization.randomized_benchmarking.standard_rb' (/Users/alessandro/Projects/Quantum/qibolab/.venv/lib/python3.11/site-packages/qibocal/protocols/characterization/randomized_benchmarking/standard_rb.py)
Ah sorry, it should work with https://github.com/qiboteam/qibocal/pull/597, but feel free to define any PulseSequence
with at least 1 measurament. This some testing code generating PulseSequences that I reused.
According to your script, the problem was just the variable name (with your full fix the final
len()
was 3, before it was 1, and with just the loop variable renaming it was 3 again).If you still observe further issues let me know!
I see, then it was my bad, I forgot to reconnect to the kernel after the change.
If it's solved, so much the better!
If you could test we have reasonable performances on hardware (as you needed in the first place), then we could just check the coverage and aim to merge.
@Jacfomg it seems like tests are passing. Would you try to write some more unit tests for the unrolling
module?
(testing features in isolation, they only depend on having a PulseSequence
as an input, and only the Bounds.update
part)
Do we need more complex tests or should this be enough ?
@stavros11 @hay-k any further opinion on this? change requests?
Otherwise I'd just move forward and merge, we can always iterate and improve.
Anyway, I’ll just commit the fix (since it’s trivial) and I guess we can merge this after since it is tested on QM and Zurich.
Since the tests are passing, I'm merging.
I would actually like to merge #832 asap. Could you review that @Jacfomg?
Implement another batching check for the unrolling and a way of selecting which for qibocal for the time being. Following #815
I will test on the devices once they are on. Also, I would like to know with other limitations we may add, like
batch_max_whatever
and to make a general one we make it pass through all of them.Checklist: