higgsfield / RL-Adventure

Pytorch Implementation of DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL
2.99k stars 587 forks source link

Distributional Reinforcement Learning with Quantile Regression #3

Open yydxlv opened 6 years ago

yydxlv commented 6 years ago

Hi, what does the "u" means in the following code snippets? It seems that the "u" is not defined in the code? Thanks!

huber_loss = 0.5 u.abs().clamp(min=0.0, max=k).pow(2) huber_loss += k (u.abs() - u.abs().clamp(min=0.0, max=k)) quantile_loss = (tau - (u < 0).float()).abs() * huber_loss

hohoCode commented 6 years ago

I think probably it should be something like:

u = dist - expected_quant

angmc commented 6 years ago

After adding u = dist - expected_quant

TypeError Traceback (most recent call last)

in () 15 16 if len(replay_buffer) > batch_size: ---> 17 loss = compute_td_loss(batch_size) 18 losses.append(loss.data[0]) 19 in compute_td_loss(batch_size) 17 huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2) 18 huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k)) ---> 19 quantile_loss = (tau - (u < 0).float()).abs() * huber_loss 20 loss = quantile_loss.sum() / num_quant 21 /home/--/anaconda2/envs/tensorflow4/lib/python2.7/site-packages/torch/tensor.pyc in __sub__(self, other) 310 311 def __sub__(self, other): --> 312 return self.sub(other) 313 314 def __rsub__(self, other): TypeError: sub received an invalid combination of arguments - got (Variable), but expected one of: * (float value) didn't match because some of the arguments have invalid types: (Variable) * (torch.FloatTensor other) didn't match because some of the arguments have invalid types: (Variable) * (float value, torch.FloatTensor other)
qfettes commented 6 years ago

Should be something like:

u = expected_dist.t().unsqueeze(-1) - dist
loss = self.huber(u) * (self.tau.view(1, -1) - (u.detach() < 0).float()).abs()
loss = loss.mean(1).sum()
angmc commented 6 years ago

When I last looked at this it ran after converting to a variable: u=expected_quant-dist huber_loss = 0.5 u.abs().clamp(min=0.0, max=k).pow(2) huber_loss += k (u.abs() - u.abs().clamp(min=0.0, max=k)) quantile_loss = (autograd.Variable(tau.cuda()) - ((u < 0).float())).abs() * (huber_loss) loss = (quantile_loss.sum() / num_quant)

LRiver-wut commented 1 year ago

Friend, this a question.

LRiver-wut commented 1 year ago

It confused me.