Policy 'lstm' doesn't work

Hello.

I firstly change the policy in by:

parser.add_argument('--policy', help='Policy architecture', choices=['cnn', 'lstm', 'lnlstm'], default='lstm')

Then I run A2C+SIL on Atari games :

python baselines/a2c/run_atari_sil.py --env BreakoutNoFrameskip-v4

I got error:

Logging to /tmp/a2c
2018-12-25 14:46:34.107377: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
WARNING:tensorflow:From e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\common\distributions.py:148: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be re
moved in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

Traceback (most recent call last):
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\framework\ops.py", line 1628, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension size must be evenly divisible by 15 but is 8192 for 'model_2/Reshape_1' (op: 'Reshape') with input shapes: [16,512], [3] and with input tensors computed as partia
l shapes: input[1] = [3,5,?].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "baselines/a2c/run_atari_sil.py", line 38, in <module>
    main()
  File "baselines/a2c/run_atari_sil.py", line 35, in main
    num_env=16)
  File "baselines/a2c/run_atari_sil.py", line 20, in train
    sil_update=sil_update, sil_beta=sil_beta)
  File "e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\a2c_sil.py", line 161, in learn
    max_grad_norm=max_grad_norm, lr=lr, alpha=alpha, epsilon=epsilon, total_timesteps=total_timesteps, lrschedule=lrschedule, sil_update=sil_update, sil_beta=sil_beta)
  File "e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\a2c_sil.py", line 35, in __init__
    sil_model = policy(sess, ob_space, ac_space, nenvs, nsteps, reuse=True)
  File "e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\policies.py", line 66, in __init__
    xs = batch_to_seq(h, nenv, nsteps)
  File "e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\utils.py", line 74, in batch_to_seq
    h = tf.reshape(h, [nbatch, nsteps, -1])
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 7759, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
    op_def=op_def)
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\framework\ops.py", line 1792, in __init__
    control_input_ops)
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\framework\ops.py", line 1631, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension size must be evenly divisible by 15 but is 8192 for 'model_2/Reshape_1' (op: 'Reshape') with input shapes: [16,512], [3] and with input tensors computed as partial shapes: input[1] = [3,5,?].

What can I do to fix this? Thank you very much!

I change the num_env in from 16 to 15:

train(args.env, num_timesteps=args.num_timesteps, seed=args.seed, policy=args.policy, lrschedule=args.lrschedule, sil_update=args.sil_update, sil_beta=args.sil_beta, num_env=15)

But got another error:

Logging to /tmp/a2c
2018-12-25 14:55:20.343377: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
WARNING:tensorflow:From e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\common\distributions.py:148: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be re
moved in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

WARNING:tensorflow:From e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\utils.py:13: calling reduce_max (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a
 future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\utils.py:15: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a
 future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Traceback (most recent call last):
  File "baselines/a2c/run_atari_sil.py", line 38, in <module>
    main()
  File "baselines/a2c/run_atari_sil.py", line 35, in main
    num_env=15)
  File "baselines/a2c/run_atari_sil.py", line 20, in train
    sil_update=sil_update, sil_beta=sil_beta)
  File "e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\a2c_sil.py", line 161, in learn
    max_grad_norm=max_grad_norm, lr=lr, alpha=alpha, epsilon=epsilon, total_timesteps=total_timesteps, lrschedule=lrschedule, sil_update=sil_update, sil_beta=sil_beta)
  File "e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\a2c_sil.py", line 69, in __init__
    sil_model.entropy, sil_model.value, sil_model.neg_log_prob,
AttributeError: 'LstmPolicy' object has no attribute 'entropy'

It seems that the policy 'lstm' can not work because the lack of entropy and so on.

What can I do to fix this? Thank you very much!

junhyukoh / self-imitation-learning

Policy 'lstm' doesn't work #5