openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.62k stars 4.86k forks source link

ACKTR on MuJoCo #545

Open R1ckF opened 6 years ago

R1ckF commented 6 years ago

Hi all,

When I try to run a Mujoco environment with ACKTR algorithm it doesn't work. Here is the full log:

Training acktr on mujoco:Hopper-v2 with arguments 
{'network': 'mlp'}
Traceback (most recent call last):
  File "/home/rick/miniconda3/envs/openai/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/rick/miniconda3/envs/openai/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1094, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/rick/miniconda3/envs/openai/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 931, in _TensorTensorConversionFunction
    (dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("acktr_model/split_1:0", shape=(80, 3), dtype=float32)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/rick/miniconda3/envs/openai/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/rick/miniconda3/envs/openai/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/rick/Gits/baselines/baselines/run.py", line 243, in <module>
    main()
  File "/home/rick/Gits/baselines/baselines/run.py", line 218, in main
    model, _ = train(args, extra_args)
  File "/home/rick/Gits/baselines/baselines/run.py", line 75, in train
    **alg_kwargs
  File "/home/rick/Gits/baselines/baselines/acktr/acktr_disc.py", line 116, in learn
    model = make_model()
  File "/home/rick/Gits/baselines/baselines/acktr/acktr_disc.py", line 111, in <lambda>
    lrschedule=lrschedule)
  File "/home/rick/Gits/baselines/baselines/acktr/acktr_disc.py", line 37, in __init__
    neglogpac = train_model.pd.neglogp(A)
  File "/home/rick/Gits/baselines/baselines/common/distributions.py", line 205, in neglogp
    + tf.reduce_sum(self.logstd, axis=-1)
  File "/home/rick/miniconda3/envs/openai/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 850, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/rick/miniconda3/envs/openai/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 8188, in sub
    "Sub", x=x, y=y, name=name)
  File "/home/rick/miniconda3/envs/openai/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 546, in _apply_op_helper
    inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Sub' Op has type float32 that does not match type int32 of argument 'x'.

I think its because ACKTR is divided in a continous and discrete algorithm and it seems as only the discrete algorithm is loaded in baselines.acktr.acktr.py

from baselines.acktr.acktr_disc import *

It does not load acktr_cont...

When I change the script to load acktr_cont and not acktr_disc other errors occur so I am thinking there should be an easier solution? What is the correct way of running Mujoco with acktr algorithm?

Thanks in advance!

pzhokhov commented 6 years ago

Hi @R1ckF! The continuous version of acktr have not gotten refactored yet (currently in the works); so you'll need to run it via python -m baselines.acktr.run_mujoco --env=Ant-v2 (for instance with Ant environment).

R1ckF commented 6 years ago

Hi @pzhokhov and thank you for your answer!

I just discovered this repository about 2 weeks ago and I like it a lot. Great work. I also think the refactoring is good idea, makes the code more accessible for everybody and easier to run.

I am not fully understanding everything yet (new to RL), but I did notice some discrepancies between the discrete and the continuous algorithm, so I have a few follow up questions: -I notice that ACKTR discrete and continuous print very different outputs to the console. Is this purely a cosmetic change or is the code also fundamentally different? -From the paper "Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation" it is mentioned that distributed simulation reduces sample efficiency. From reading the code it looks like ACKTR discrete can be run in multiple environments simultaneously with the "--num_env" flag. For ACKTR continous this is not the case. Is this correct? If so why is this?

Thanks again!

pzhokhov commented 6 years ago

Hi @R1ckF ! Sorry about the lag! Short story is this - the version of ACKTR with continuous action spaces uses entirely different code as a version for discrete action spaces (except for KfacOptimizer, which both use). There are a few tweaks to the continuous version (having optimizer with separate hyperparameters for value function approximator, weight decay etc) which are implemented in the continuous version. Given that I am not the original author of the code, I cannot speak confidently of the relative role of each; however, I have opened a PR to unite discrete and continuous versions, here: https://github.com/openai/baselines/pull/560 - hopefully there will be some input from the authors. If you want to give a unified version a spin, feel free to check out the branch from the PR and try it.

R1ckF commented 5 years ago

Thx @pzhokhov for your response! I'll definitely check it out.