I have several questions:
1- When I compared with algorithm presented in"Human-level control through deep reinforcement learning", I can not find the third initialization (initial target action value)? Also, I do not find the last step "every C step Qhat=Q"? Would you please explain where are them or what is the difference to reach them? These steps seems essential!
2- I have my own environment, If I want to have a state=[a,b,c] as input instead of just one input for DQN showing the state what I should do?
I have several questions: 1- When I compared with algorithm presented in"Human-level control through deep reinforcement learning", I can not find the third initialization (initial target action value)? Also, I do not find the last step "every C step Qhat=Q"? Would you please explain where are them or what is the difference to reach them? These steps seems essential! 2- I have my own environment, If I want to have a state=[a,b,c] as input instead of just one input for DQN showing the state what I should do?