pulver22 / QLAB

Quadrotor LAnding Benchmarking (QLAB) is a simulated environment for developing and testing landing algorithms for unmanned aerial vehicles.
29 stars 11 forks source link

priority reward partition during training #7

Closed yiia closed 4 years ago

yiia commented 4 years ago

Hi,"The batch used for training the policy is assembled picking experiences from each one of the K datasets with a certain fraction ρ ∈ {ρ1 , ..., ρK }." from your paper.Can you tell me which python file the code that implements the priority reward partition during training?I didn't find it. I hope I can get your help, thank you very much.

pulver22 commented 4 years ago

Hi @yiia,

sorry for the super late reply. You are indeed correct that the file is missing. The partitioned buffer replay is used only in the descending phase but the train.py file is missing. I am going to add it shortly.

In any case, the sampling from the different partitions is implemented in the following way:

# 4- Sampling a random mini-batch from the Replay Buffer
experience_batch = replay_buffer_neutral.return_experience_batch(batch_size=batch_size - batch_size_positive - batch_size_negative)
experience_batch_positive = replay_buffer_positive.return_experience_batch(batch_size=batch_size_positive)
experience_batch_negative = replay_buffer_negative.return_experience_batch(batch_size = batch_size_negative)
experience_batch = experience_batch + experience_batch_positive + experience_batch_negative

where batch_size=32, batch_size_positive=8, batch_size_negative=8. I am going to close the issue since it is not relevant anymore, be free to open it if you need any further help.

I hope this help, sorry again for the late reply.