Open tegg89 opened 6 years ago
Thanks for your question. But I won't be available for a few days. I will revisit it when I have time. Which pytorch version do you use? I haven't updated to latest version. It might be the problem.
@transedward Thanks for your reply. I have tested in Pytorch 0.2.0.post1 (0.2.0.1), Python 3.5.3 with Anaconda and Ubuntu 16.04.
@tegg89 : Checkout #8 . Let us know if it worked/didn't work.
Hi, thanks for sharing your wonderful code. But I have met some errors when running it.
Inside the line 197~205 from
dqn_learn.py
, the size oftarget_Q_values
and that ofcurrent_Q_values
does not matched well. I have changed tonext_max_q = next_max_q.unsqueeze(-1)
for correcting sizes. Also I have changed torew_batch[0]
from line 203.(IMO) After stacking records in replay buffer, queue action does not work properly. I have changed the line 158 to
action = select_epilson_greedy_action(Q, recent_observations, t)
, however different action value has queued.I am still working these but having troubles. Could you help make them right?