hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.12k stars 724 forks source link

IndexError: list index out of range #259

Closed hn2 closed 5 years ago

hn2 commented 5 years ago

When trying CnnLnLstmPolicy I get IndexError: list index out of range

at

File "c:\users\hanna\stable-baselines\stable_baselines\ppo2\ppo2.py", line 93, in init self.setup_model() File "c:\users\hanna\stable-baselines\stable_baselines\ppo2\ppo2.py", line 126, in setup_model n_batch_step, reuse=False, self.policy_kwargs) File "c:\users\hanna\stable-baselines\stable_baselines\common\policies.py", line 518, in init layer_norm=True, feature_extraction="cnn", _kwargs) File "c:\users\hanna\stable-baselines\stable_baselines\common\policies.py", line 292, in init extracted_features = cnn_extractor(self.processed_obs, kwargs) File "c:\users\hanna\stable-baselines\stable_baselines\common\policies.py", line 24, in nature_cnn layer_1 = activ(conv(scaled_images, 'c1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), kwargs)) File "c:\users\hanna\stable-baselines\stable_baselines\a2c\utils.py", line 124, in conv n_input = input_tensor.get_shape()[channel_ax].value File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 616, in getitem return self._dims[key] IndexError: list index out of range

In fact I was unable to test any other policy except MlpPolicy

hill-a commented 5 years ago

No minimal code and the template is not filled in.

File "c:\users\hanna\stable-baselines\stable_baselines\common\policies.py", line 24, in nature_cnn layer_1 = activ(conv(scaled_images, 'c1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), **kwargs))

My guess is that you are using a Cnn when you input is not an image (or is a malformed image). As I explained in #242, use a flattend observation with Mlp policies if you are not using images. This does not seem like a stable-baselines issue.

This is your last warning, we do not do tech support (cf #257, #242). Only open an issue to ask a question about stable baselines, or if you think stable baselines has an issue (in which case fill in the template with minimal code). If you ask any other tech support question, you will be blocked.

hn2 commented 5 years ago

I tried MlpLnLstmPolicy. It works well with PPO2, but fails with PPO1:

model = PPO1(MlpLnLstmPolicy, env, verbose=0, tensorboard_log=settings['tensorboard_log']) File "c:\users\hanna\stable-baselines\stable_baselines\ppo1\pposgd_simple.py", line 83, in init self.setup_model() File "c:\users\hanna\stable-baselines\stable_baselines\ppo1\pposgd_simple.py", line 101, in setup_model None, reuse=False, self.policy_kwargs) File "c:\users\hanna\stable-baselines\stable_baselines\common\policies.py", line 577, in init layer_norm=True, feature_extraction="mlp", _kwargs) File "c:\users\hanna\stable-baselines\stable_baselines\common\policies.py", line 301, in init layer_norm=layer_norm) File "c:\users\hanna\stable-baselines\stable_baselines\a2c\utils.py", line 201, in lstm weight_x = tf.get_variable("wx", [n_input, n_hidden * 4], initializer=ortho_init(init_scale)) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1487, in get_variable aggregation=aggregation) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1237, in get_variable aggregation=aggregation) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 540, in get_variable aggregation=aggregation) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 492, in _true_getter aggregation=aggregation) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 904, in _get_single_variable tf_inspect.getargspec(initializer).args) ValueError: You can only pass an initializer function that expects no arguments to its callable when the shape is not fully defined. The given initializer function expects the following args ['shape']

hill-a commented 5 years ago

RTFM

https://stable-baselines.readthedocs.io/en/master/modules/ppo1.html#can-i-use

PPO1 does not support recurrent policies.

hn2 commented 5 years ago

Ok. Now I get: Is it a problem with my env or parameters?

<class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'> Traceback (most recent call last): File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call return fn(*args) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had NaN values [[{{node loss/VerifyFinite/CheckNumerics}} = CheckNumericsT=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\hanna\Anaconda3\lib\site-packages\quantiacsToolbox\quantiacsToolbox.py", line 871, in runts position, settings = TSobject.myTradingSystem(*argList) File "ppo2_env5_train.py", line 30, in myTradingSystem model.learn(total_timesteps=settings['total_timesteps']) File "c:\users\hanna\stable-baselines\stable_baselines\ppo2\ppo2.py", line 326, in learn writer=writer, states=mb_states)) File "c:\users\hanna\stable-baselines\stable_baselines\ppo2\ppo2.py", line 257, in _train_step td_map) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 929, in run run_metadata_ptr) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run run_metadata) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had NaN values [[node loss/VerifyFinite/CheckNumerics (defined at c:\users\hanna\stable-baselines\stable_baselines\ppo2\ppo2.py:175) = CheckNumericsT=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'loss/VerifyFinite/CheckNumerics', defined at: File "ppo2_env5_train.py", line 89, in results = runts(file) File "C:\Users\hanna\Anaconda3\lib\site-packages\quantiacsToolbox\quantiacsToolbox.py", line 871, in runts position, settings = TSobject.myTradingSystem(argList) File "ppo2_env5_train.py", line 29, in myTradingSystem model = PPO2(MlpLnLstmPolicy, env, verbose=0, nminibatches=1, tensorboard_log=settings['tensorboard_log']) File "c:\users\hanna\stable-baselines\stable_baselines\ppo2\ppo2.py", line 93, in init self.setup_model() File "c:\users\hanna\stable-baselines\stable_baselines\ppo2\ppo2.py", line 175, in setup_model grads, _grad_norm = tf.clip_by_global_norm(grads, self.max_grad_norm) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\ops\clip_ops.py", line 265, in clip_by_global_norm "Found Inf or NaN global norm.") File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\ops\numerics.py", line 47, in verify_tensor_all_finite verify_input = array_ops.check_numerics(t, message=msg) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 972, in check_numerics "CheckNumerics", tensor=tensor, message=message, name=name) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func return func(args, **kwargs) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op op_def=op_def) File "C:\Users\hanna\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Found Inf or NaN global norm. : Tensor had NaN values [[node loss/VerifyFinite/CheckNumerics (defined at c:\users\hanna\stable-baselines\stable_baselines\ppo2\ppo2.py:175) = CheckNumericsT=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

tehZevo commented 5 years ago

Hello, Firstly, I'd like to thank you for this great repo. I am having a similar issue as hn2's most recent exception when using MlpPolicy with PPO2; but this is not a request for tech support. Rather, it is a criticism of the handling of the issue. I agree that GitHub issues are no place for tech support. However without a chat/support location listed on the readme (eg gitter, slack), issues like these are unfortunately to be expected.

Anyway, I'm off to hunt down my error, since I expect this issue to be locked in response.

Again, thanks for stable-baselines!

tehZevo commented 5 years ago

Found my issue @hn2; a single NaN reward was given in my custom environment. Good luck!

hill-a commented 5 years ago

hey @tehZevo,

The issue here is that hn2 already opened quite a few issues in the past without filling out the proper template issue, and as such not giving us the information to help him or correct any bugs.

We dont do tech support as we only give the tools to work with reinforcement learning, not how to use them. There is unfortunalty a fine line between doing techsupport and fixing bugs in a library like Stable-baselines. However I believe that hn2 has crossed it on many occasions (as can be seen here #257, #242, and #279).

However without a chat/support location listed on the readme (eg gitter, slack), issues like these are unfortunately to be expected.

I dissagree with you on this, Stable-baselines is a project maintained by mostly PhD students, branched off Open-AI Baselines in order to make working with reinforcement learning easier as faster. It is designed to look like a polished and easy to use library, but it has nowhere near the developpement team as TensorFlow or Sklearn. We simply lack the time already to work on Stable-baselines let alone help other with technical support, although we try our best.

In any case, I'm glad you where able to fix the issue, since NaN's are a systematic issue in Machine learning, and can be a nuisance :)

FerusAndBeyond commented 5 years ago

@hill-a I got the same error "list index out of range" when trying to use CnnPolicy, I'm not using images but a (100, 4) matrix as input shape. What's the constraints specifically to be able to use CnnPolicy? Inputs to CNNs doesn't have to be images, in GO CNNs are used for example.

hn2 commented 5 years ago

I get now a different error with cnn policy: ValueError: Negative dimension size caused by subtracting 8 from 3 for 'model/c1/Conv2D' (op: 'Conv2D') with input shapes: [?,16,3,4], [8,8,4,32].

araffin commented 5 years ago

@FerusAndBeyond @hn2 I really recommend you both to learn a bit more about the techniques you are using (stanford course is good start for CNN ;)). For instance, Conv2D (present in CNNPolicy) are made for tensors of shape (h, w, c) where c is the number of channels and must be at least 1 (that's why it may not work in your case @FerusAndBeyond , also you should use a custom CNN policy otherwise the input will be normalized as if it was an image).

Convolution also shrinks the input image (when not using padding), so, if the input is not big enough, it won't work (that's your problem @hn2).

araffin commented 5 years ago

I thought, since observation_space is an input to policy, that it swapped between Conv2D/Conv1D. Doesn't say anything about it in the documentation but I guess I have to read the source code instead.

The default policies are made for the most common use only. As stated in the doc: "CnnPolicies are for images only. MlpPolicies are made for other type of features (e.g. robot joints)" For anything custom, yes, I recommend you to look at the code.

Also I think if the input_shape isn't what it supposed to be it should throw some error and not give this very uninformative one.

If you think you can improve that, then we welcome any PR in that direction ;)

jaberkow commented 5 years ago

@FerusAndBeyond @hn2 I really recommend you both to learn a bit more about the techniques you are using (stanford course is good start for CNN ;)). For instance, Conv2D (present in CNNPolicy) are made for tensors of shape (h, w, c) where c is the number of channels and must be at least 1 (that's why it may not work in your case @FerusAndBeyond , also you should use a custom CNN policy otherwise the input will be normalized as if it was an image).

Convolution also shrinks the input image (when not using padding), so, if the input is not big enough, it won't work (that's your problem @hn2).

Would Conv2D work on a tensor of shape (1,w,3) where w > 1, or will that through a padding error? I'm working on applying RL to a 1 dimensional system (with 3 channels) that has (approximate) translational invariance so I feel like an effectively one-dimensional convolution is appropriate.

Sokomba01 commented 3 years ago

I am trying to implimenting DQN baseline with Unity wrapped environment but following error accoured. can anyone help me. Traceback (most recent call last):

File "", line 1, in runfile('D:/MSCS/Thesis/Dr. Munam/MyImplementation/train_unity.py', wdir='D:/MSCS/Thesis/Dr. Munam/MyImplementation')

File "C:\Users\raza_\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile execfile(filename, namespace)

File "C:\Users\raza_\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "D:/MSCS/Thesis/Dr. Munam/MyImplementation/train_unity.py", line 36, in main()

File "D:/MSCS/Thesis/Dr. Munam/MyImplementation/train_unity.py", line 30, in main dueling=True

File "D:\MSCS\Thesis\Dr. Munam\MyImplementation\baselines\deepq\deepq.py", line 208, in learn param_noise=param_noise

File "D:\MSCS\Thesis\Dr. Munam\MyImplementation\baselines\deepq\build_graph.py", line 379, in build_train act_f = build_act(make_obs_ph, q_func, num_actions, scope=scope, reuse=reuse)

File "D:\MSCS\Thesis\Dr. Munam\MyImplementation\baselines\deepq\build_graph.py", line 186, in build_act q_values = q_func(observations_ph.get(), num_actions, scope="q_func")

File "D:\MSCS\Thesis\Dr. Munam\MyImplementation\baselines\deepq\models.py", line 12, in q_func_builder latent = network(input_placeholder)

File "D:\MSCS\Thesis\Dr. Munam\MyImplementation\baselines\common\models.py", line 109, in network_fn return nature_cnn(X, **conv_kwargs)

File "D:\MSCS\Thesis\Dr. Munam\MyImplementation\baselines\common\models.py", line 22, in nature_cnn **conv_kwargs))

File "D:\MSCS\Thesis\Dr. Munam\MyImplementation\baselines\a2c\utils.py", line 49, in conv nin = x.get_shape()[channel_ax].value

File "C:\Users\raza_\Anaconda3\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 889, in getitem return self._dims[key]

IndexError: list index out of range