priority reward partition during training

Hi @yiia,

sorry for the super late reply. You are indeed correct that the file is missing. The partitioned buffer replay is used only in the descending phase but the train.py file is missing. I am going to add it shortly.

In any case, the sampling from the different partitions is implemented in the following way:

# 4- Sampling a random mini-batch from the Replay Buffer
experience_batch = replay_buffer_neutral.return_experience_batch(batch_size=batch_size - batch_size_positive - batch_size_negative)
experience_batch_positive = replay_buffer_positive.return_experience_batch(batch_size=batch_size_positive)
experience_batch_negative = replay_buffer_negative.return_experience_batch(batch_size = batch_size_negative)
experience_batch = experience_batch + experience_batch_positive + experience_batch_negative

where batch_size=32, batch_size_positive=8, batch_size_negative=8. I am going to close the issue since it is not relevant anymore, be free to open it if you need any further help.

I hope this help, sorry again for the late reply.

pulver22 / QLAB

priority reward partition during training #7