mbecker12 / surface-rl-decoder

Implementation of different NN architectures & RL techniques for decoding of the quantum surface code
MIT License
7 stars 0 forks source link

Evaluate Parallelization of PPO program structure #107

Open mbecker12 opened 3 years ago

mbecker12 commented 3 years ago

As it is now, the PPO algorithm first calls a function that triggers all the workers to step through episodes and produce samples this way. After that those samples are save in the replay buffer and in multiple learner iterations randomly sampled from the replay buffer. These multiple learner iterations make up one learner step.

Maybe there is a way to let the workers do their work while the learner iterations are running.