Closed DavidHJong closed 1 year ago
Based on your other issue, differences may be due to using more recent open AI gym versions, which generates what the current state (this_state
= first entry in transition to memorize).
for episode in range(1, max_episodes + 1):
this_state = trading_environment.reset()
for episode_step in range(max_episode_steps):
Hope this helps.
Describe the bug
np.ndarray
data type object in theddqn.experience.action
list. At very beginning, the interval is once per episode. Then, it becomes once per every 20 steps, and finally every step.To Reproduce
ddqn.experience
whoseaction
is not aninteger
.Question