Closed inmzhang closed 3 years ago
I used a kind of clumsy method to deal with this. I divide every n = 100 experiments into a group, then there are 500,000/100 groups. I create a new matching object using a group and decode for every group. But when n grows, this method converge then get worse. It should be that creating a matching object with big n is time-consuming.
Hi @inmzhang123, without seeing the rest of your code or timing/profiling information I'm not sure exactly what the bottleneck/problem is, so if it's easy to post a complete working example that demonstrates the problem (or timing data), that would help.
How large are your matching graphs, and how large is each correction
array? Could it be you are running out of RAM if each instance is large? If so you could try removing the results.append(correction)
line and see if that speeds things up - you may not need to store all the results anyway if all you need is a tally of success and failures.
Thanks for replying! @oscarhiggott I run a d=3,round=2 surface code for 500,000 shots and decode for X errors. The matching graph has 8 detectors,1 boundary. So every time I call decode() I pass in parameter like (2*4,1) nd array for total 500,000 calls. I used only one qubit label to see whether logical error happend, so the correction returned is a single bool. It consume 25s to decode total 500,000 shots, I want to short the time further. Maybe you have some excellent suggestion. Sorry for the format I type in. I have to use my phone to do this right now.
I just ran a couple of tests for surface code circuits with the same dimensions as you and also get a decoding time of around 50 microseconds per sample with the default setting (using 1% circuit-level depolarising noise). That seems reasonable/expected to me so I don't think there is any unexpected behaviour. Since the circuits are small, it's a little faster to use exact matching by setting num_neighbours=None
in Matching.decode
(since this caches the shortest paths). That reduces the runtime to around 35 microseconds for me. There are further optimisations to MWPM that can be made, but if you need to reduce the runtime further you might want to use another algorithm, such as Union-Find. How much faster do you need it to be for your use case?
Setting num_neighbours=None
really helps! It shorts the time to decode 500,000 shots from 25s to 10s and that's enough for me.
I'll close this issue and thanks again for your work.
Revisiting this issue using decode_batch
in PyMatching v2.1.0 (with the new blossom implementation) and running on a distance=3, rounds=3 dataset from the Google surface code experiment. Running:
import os
import time
import stim
import pymatching
fullpath = os.path.join(experiment_path, "surface_code_bX_d3_r03_center_3_5")
dem = stim.DetectorErrorModel.from_file(os.path.join(fullpath, "pij_from_even_for_odd.dem"))
shots = stim.read_shot_data_file(path=os.path.join(fullpath, "detection_events.b8"), format="b8", num_measurements=0, num_detectors=dem.num_detectors, num_observables=0)
actual_observables = stim.read_shot_data_file(path=os.path.join(fullpath, "obs_flips_actual.01"), format="01", num_measurements=0, num_detectors=0, num_observables=1)
matching = pymatching.Matching.from_detector_error_model(dem)
t1 = time.time()
for i in range(10):
predicted_observables = matching.decode_batch(shots)
t2 = time.time()
print(t2-t1)
where fullpath
is the path to the google_qec3v5_experiment_data
directory now takes 0.3 seconds on my M2 machine. I ran 10 loops since the dataset is 50k shots, not 500k.
For example, I get more than 500,000 stabilizer results after running a single QEC experiment or simulation repeatedly. I want to decode this large amount of data efficiently. In my implementation, I input one set of exp data into a fixed matching for one time, which takes relatively long time to decode all 500,000 results. I wonder whether there is more efficient way to do this.