higgsfield / RL-Adventure

Pytorch Implementation of DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL
2.99k stars 587 forks source link

Error in projection_distribution (Distributional DQN) ? #8

Open pclucas14 opened 6 years ago

pclucas14 commented 6 years ago

Hi,

I have a question regarding the projection_distribution method. It seems that when you are projecting back on the support/bins, at lines :

proj_dist.view(-1).index_add_(0, (l + offset).view(-1), (next_dist * (u.float() - b)).view(-1)) 
proj_dist.view(-1).index_add_(0, (u + offset).view(-1), (next_dist * (b - l.float()) ).view(-1))

the distribution next_dist is scaled by the support from the line next_dist = target_model(next_state).data.cpu() * support It seems like this should not be the case. This results in the final projected distribution not summing up to one. It seems one should do something like

next_dist_raw = target_model(next_state).data.cpu()
next_dist = next_dist_raw * support
next_action = next_dist.sum(2).max(1)[1]
next_action = next_action.unsqueeze(1).unsqueeze(1).expand(next_dist.size(0), 1, next_dist.size(2))
next_dist = next_dist.gather(1, next_action).squeeze(1)
next_dist_raw = next_dist_raw.gather(1, next_action).squeeze(1)
proj_dist.view(-1).index_add_(0, (l + offset).view(-1), (next_dist_raw * (u.float() - b)).view(-1))
proj_dist.view(-1).index_add_(0, (u + offset).view(-1), (next_dist_raw * (b - l.float()) ).view(-1))

This results in a distribution that contains the same amount of mass as the original one.

Thank you, Lucas

miilue commented 1 year ago

I think the same as you. I'm a bit confused about this author's implementation of the Distributional DQN.