Closed Capsar closed 2 years ago
Update:
I have also tried it with DQN:
self.agent = Agent.create(
agent='dqn',
environment=self.environment,
memory=memory_size,
batch_size=64,
network='auto',
learning_rate=0.001,
horizon=1,
discount=0.95
)
Same Environment setup but training is different:
total_reward = 0
for i in range(1, epochs):
states = self.environment.reset()
terminal = False
while not terminal:
actions = self.agent.act(states=states)
states, terminal, reward = self.environment.execute(actions=actions)
self.agent.observe(terminal=terminal, reward=reward)
total_reward += reward
if i % number == 0:
print('episode:', i, "total reward:", total_reward/number)
total_reward = 0
Still no results for MountainCar or Acrobot, but CartPole trains within 400 episodes to a reward of 500. So I am still looking for a fix or someone to help me out here.
Hi,
I have increased the memory size which now shows progress in the reward amount, indicating it is learning. I will close the issue.
Kind regards, Caspar
Feel free to post your agent config if you solve these environments. Despite appearing very simple, they're not actually straightforward to solve (as compared to CartPole, which is "relatively easy").
Hi,
Ok for Acrobot-v1, MountainCar-v0 and CartPole-v1 the following setup works:
self.environment = Environment.create(environment='gym', level=self.env.spec.id)
network_spec = [
dict(type='dense', size=64),
dict(type='dense', size=64),
dict(type='dense', size=64)
]
# print('states', self.environment.states())
# print('actions', self.environment.actions())
self.agent = Agent.create(
agent='dqn',
states=self.environment.states(),
actions=self.environment.actions(),
max_episode_timesteps=self.env._max_episode_steps,
memory=memory_size,
batch_size=batch_size,
network=network_spec
)
Sometimes 2 64 layers sometimes 3. With memory_size of 10000, 100000 for cartpole and mountaincar and 200000 for acrobot. Batch_size was 32.
Hi,
Currently working with the Tensorforce 0.65 version from github compatible with Tensorflow 2.7 and I am running into a problem where the standard tensorforce agent setup is unable to learn anything other than on CartPole.
Current setup:
With training as followed (Is this better or worse than not dividing by the number?):
So the problem is that when level_id is "CartPole-v1" it trains and after a couple of training runs it achieves a good policy with good returns (reward). But when I try out "Acrobot-v1" or "MountainCar-v0" it stays at respectively -500 & -200 reward indicating no learning as this is the worst possible score for both Environments.
Could someone help me out so that the other Environments also train or maybe spot the bug? No Errors or what so ever occur when training or initializing.
Kind regards, Caspar