Error in projection_distribution (Distributional DQN) ?

Hi,

I have a question regarding the projection_distribution method. It seems that when you are projecting back on the support/bins, at lines :

proj_dist.view(-1).index_add_(0, (l + offset).view(-1), (next_dist * (u.float() - b)).view(-1)) 
proj_dist.view(-1).index_add_(0, (u + offset).view(-1), (next_dist * (b - l.float()) ).view(-1))

the distribution next_dist is scaled by the support from the line next_dist = target_model(next_state).data.cpu() * support It seems like this should not be the case. This results in the final projected distribution not summing up to one. It seems one should do something like

next_dist_raw = target_model(next_state).data.cpu()
next_dist = next_dist_raw * support
next_action = next_dist.sum(2).max(1)[1]
next_action = next_action.unsqueeze(1).unsqueeze(1).expand(next_dist.size(0), 1, next_dist.size(2))
next_dist = next_dist.gather(1, next_action).squeeze(1)
next_dist_raw = next_dist_raw.gather(1, next_action).squeeze(1)

proj_dist.view(-1).index_add_(0, (l + offset).view(-1), (next_dist_raw * (u.float() - b)).view(-1))
proj_dist.view(-1).index_add_(0, (u + offset).view(-1), (next_dist_raw * (b - l.float()) ).view(-1))

This results in a distribution that contains the same amount of mass as the original one.

Thank you, Lucas

higgsfield / RL-Adventure

Error in projection_distribution (Distributional DQN) ? #8