notadamking / RLTrader

A cryptocurrency trading environment using deep reinforcement learning and OpenAI's gym
https://discord.gg/ZZ7BGWh
GNU General Public License v3.0
1.71k stars 537 forks source link

Found Inf or NaN global norm. : Tensor had Inf values #22

Open archenroot opened 5 years ago

archenroot commented 5 years ago

While the optimize.py continue running, I observed one exception, but the process continue...

[W 2019-06-08 17:58:27,948] Setting status of trial#14 as TrialState.FAIL because of the following error: InvalidArgumentError()
Traceback (most recent call last):
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had Inf values
     [[{{node loss/VerifyFinite/CheckNumerics}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/optuna/study.py", line 399, in _run_trial
    result = func(trial)
  File "optimize.py", line 88, in optimize_agent
    model.learn(evaluation_interval)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py", line 326, in learn
    writer=writer, states=mb_states))
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py", line 257, in _train_step
    td_map)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had Inf values
     [[node loss/VerifyFinite/CheckNumerics (defined at /home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py:175) ]]

Caused by op 'loss/VerifyFinite/CheckNumerics', defined at:
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/optuna/study.py", line 357, in func_child_thread
    self._run_trial(func, catch)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/optuna/study.py", line 399, in _run_trial
    result = func(trial)
  File "optimize.py", line 81, in optimize_agent
    tensorboard_log="./tensorboard", **model_params)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py", line 93, in __init__
    self.setup_model()
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py", line 175, in setup_model
    grads, _grad_norm = tf.clip_by_global_norm(grads, self.max_grad_norm)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/ops/clip_ops.py", line 271, in clip_by_global_norm
    "Found Inf or NaN global norm.")
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/ops/numerics.py", line 44, in verify_tensor_all_finite
    return verify_tensor_all_finite_v2(t, msg, name)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/ops/numerics.py", line 62, in verify_tensor_all_finite_v2
    verify_input = array_ops.check_numerics(x, message=message)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 919, in check_numerics
    "CheckNumerics", tensor=tensor, message=message, name=name)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Found Inf or NaN global norm. : Tensor had Inf values
     [[node loss/VerifyFinite/CheckNumerics (defined at /home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py:175) ]]
notadamking commented 5 years ago

I have also run into this error, but have found no success in debugging it. It is caused by Inf or NaN making it into the model's network, though I am unsure how, as the observation space, action space, and reward space all actively replace nan and abs(inf) with 0. Any ideas?

archenroot commented 5 years ago

hm, need to do more tracing...

araffin commented 5 years ago

Hi,

We just release a guide in the documentation to tackle this type of issue. Feel free to open an issue on stable-baselines repo if you find something wrong coming from the library.

notadamking commented 5 years ago

@archenroot has anyone debugged this anymore using VecCheckNan from stable-baselines?