mit-acl / rl_collision_avoidance

Training code for GA3C-CADRL algorithm (collision avoidance with deep RL)
117 stars 28 forks source link

ModuleNotFoundError: No module named 'numpy' #5

Closed BingHan0458 closed 3 years ago

BingHan0458 commented 3 years ago

Hello, I got an error when running this command: "./train.sh TrainPhase1" as follows: `Entered virtualenv.

Running GA3C-CADRL gym-collision-avoidance training script (TrainPhase1)

Traceback (most recent call last): File "Run.py", line 31, in import Config File "/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Config.py", line 27, in import numpy as np ModuleNotFoundError: No module named 'numpy' but in my linux, this 'numpy' package has been already installed in two versions of python(2.7 and 3.6): ~/catkin_ws/src/CADRL/rl_collision_avoidance$ pip3 install numpy Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (1.19.5) ~/catkin_ws/src/CADRL/rl_collision_avoidance$ pip install numpy Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (1.19.5) ` and there is not error when I run "import numpy" in interactive environment of python2 and python3:

python3 Python 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import numpy exit()

python Python 2.7.17 (default, Sep 30 2020, 13:38:04) [GCC 7.5.0] on linux2 Type "help", "copyright", "credits" or "license" for more information.

import numpy exit()

I guess that due to the high version of numpy1.19.5? If so, what version should I install? I don't how to solve it. Thank you very much!

mfe7 commented 3 years ago

so the ./train.sh script first enters the virtualenv that was created when installing the dependencies, then starts the python training script. did ./install.sh complete successfully?

i would guess that the virtualenv is different than either of those two python versions -- one way to check is to enter the virtualenv by source venv/bin/activate (depending on the path of your virtualenv), then start an interactive python session (confirm that which python points to a version within your virtualenv, not a system-wide python), then try import numpy.

BingHan0458 commented 3 years ago

Thank you very much! This bug has been solved. But there is another error when running this command: "./train.sh TrainPhase1" as follows: Entered virtualenv.

Running GA3C-CADRL gym-collision-avoidance training script (TrainPhase1)

Traceback (most recent call last): File "Run.py", line 74, in Server().main() File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Server.py", line 43, in init self.training_q = Queue(maxsize=Config.MAX_QUEUE_SIZE) AttributeError: module 'Config' has no attribute 'MAX_QUEUE_SIZE'

but I search the Config.py and there are "self.MAX_QUEUE_SIZE = 100 # Max size of the queue". I don't know why?

mfe7 commented 3 years ago

hmm. the Config object should get loaded from the Config.py in this repo, but it's possible it is loading the default Config.py from the gym_collision_avoidance directory if the path isn't set correctly. Could you add print(Config.__dict__) to the line above the one that causes this error, which will show all the attributes of the config object, which would help us debug whether it's loading the right class?

Also, when pasting any code/terminal logs, it's helpful to use this formatting:

Traceback (most recent call last):
File "Run.py", line 74, in
Server().main()
File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Server.py", line 43, in init
self.training_q = Queue(maxsize=Config.MAX_QUEUE_SIZE)
AttributeError: module 'Config' has no attribute 'MAX_QUEUE_SIZE'
BingHan0458 commented 3 years ago

when I add print(Config.__dict__) to Server.py and run ./train.sh TrainPhase1, the result is as follows:

{'name': 'Config', 'doc': None, 'package': '', 'loader': <_frozen_importlib_external.SourceFileLoader object at 0x7f0deb8f2c50>, 'spec': ModuleSpec(name='Config', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f0deb8f2c50>, origin='/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Config.py'), 'file': '/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Config.py', 'cached': '/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/pycache/Config.cpython-36.pyc', 'builtins': {'name': 'builtins', 'doc': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the nil' object; Ellipsis represents...' in slices.", 'package': '', 'loader': <class '_frozen_importlib.BuiltinImporter'>, 'spec': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>), 'build_class': <built-in function build_class>, 'import': , 'abs': , 'all': , 'any': , 'ascii': , 'bin': , 'callable': , 'chr': , 'compile': , 'delattr': , 'dir': , 'divmod': , 'eval': , 'exec': , 'format': , 'getattr': , 'globals': , 'hasattr': , 'hash': , 'hex': , 'id': , 'input': , 'isinstance': , 'issubclass': , 'iter': , 'len': , 'locals': , 'max': , 'min': , 'next': , 'oct': , 'ord': , 'pow': , 'print': , 'repr': , 'round': , 'setattr': , 'sorted': , 'sum': , 'vars': , 'None': None, 'Ellipsis': Ellipsis, 'NotImplemented': NotImplemented, 'False': False, 'True': True, 'bool': <class 'bool'>, 'memoryview': <class 'memoryview'>, 'bytearray': <class 'bytearray'>, 'bytes': <class 'bytes'>, 'classmethod': <class 'classmethod'>, 'complex': <class 'complex'>, 'dict': <class 'dict'>, 'enumerate': <class 'enumerate'>, 'filter': <class 'filter'>, 'float': <class 'float'>, 'frozenset': <class 'frozenset'>, 'property': <class 'property'>, 'int': <class 'int'>, 'list': <class 'list'>, 'map': <class 'map'>, 'object': <class 'object'>, 'range': <class 'range'>, 'reversed': <class 'reversed'>, 'set': <class 'set'>, 'slice': <class 'slice'>, 'staticmethod': <class 'staticmethod'>, 'str': <class 'str'>, 'super': <class 'super'>, 'tuple': <class 'tuple'>, 'type': <class 'type'>, 'zip': <class 'zip'>, 'debug': True, 'BaseException': <class 'BaseException'>, 'Exception': <class 'Exception'>, 'TypeError': <class 'TypeError'>, 'StopAsyncIteration': <class 'StopAsyncIteration'>, 'StopIteration': <class 'StopIteration'>, 'GeneratorExit': <class 'GeneratorExit'>, 'SystemExit': <class 'SystemExit'>, 'KeyboardInterrupt': <class 'KeyboardInterrupt'>, 'ImportError': <class 'ImportError'>, 'ModuleNotFoundError': <class 'ModuleNotFoundError'>, 'OSError': <class 'OSError'>, 'EnvironmentError': <class 'OSError'>, 'IOError': <class 'OSError'>, 'EOFError': <class 'EOFError'>, 'RuntimeError': <class 'RuntimeError'>, 'RecursionError': <class 'RecursionError'>, 'NotImplementedError': <class 'NotImplementedError'>, 'NameError': <class 'NameError'>, 'UnboundLocalError': <class 'UnboundLocalError'>, 'AttributeError': <class 'AttributeError'>, 'SyntaxError': <class 'SyntaxError'>, 'IndentationError': <class 'IndentationError'>, 'TabError': <class 'TabError'>, 'LookupError': <class 'LookupError'>, 'IndexError': <class 'IndexError'>, 'KeyError': <class 'KeyError'>, 'ValueError': <class 'ValueError'>, 'UnicodeError': <class 'UnicodeError'>, 'UnicodeEncodeError': <class 'UnicodeEncodeError'>, 'UnicodeDecodeError': <class 'UnicodeDecodeError'>, 'UnicodeTranslateError': <class 'UnicodeTranslateError'>, 'AssertionError': <class 'AssertionError'>, 'ArithmeticError': <class 'ArithmeticError'>, 'FloatingPointError': <class 'FloatingPointError'>, 'OverflowError': <class 'OverflowError'>, 'ZeroDivisionError': <class 'ZeroDivisionError'>, 'SystemError': <class 'SystemError'>, 'ReferenceError': <class 'ReferenceError'>, 'BufferError': <class 'BufferError'>, 'MemoryError': <class 'MemoryError'>, 'Warning': <class 'Warning'>, 'UserWarning': <class 'UserWarning'>, 'DeprecationWarning': <class 'DeprecationWarning'>, 'PendingDeprecationWarning': <class 'PendingDeprecationWarning'>, 'SyntaxWarning': <class 'SyntaxWarning'>, 'RuntimeWarning': <class 'RuntimeWarning'>, 'FutureWarning': <class 'FutureWarning'>, 'ImportWarning': <class 'ImportWarning'>, 'UnicodeWarning': <class 'UnicodeWarning'>, 'BytesWarning': <class 'BytesWarning'>, 'ResourceWarning': <class 'ResourceWarning'>, 'ConnectionError': <class 'ConnectionError'>, 'BlockingIOError': <class 'BlockingIOError'>, 'BrokenPipeError': <class 'BrokenPipeError'>, 'ChildProcessError': <class 'ChildProcessError'>, 'ConnectionAbortedError': <class 'ConnectionAbortedError'>, 'ConnectionRefusedError': <class 'ConnectionRefusedError'>, 'ConnectionResetError': <class 'ConnectionResetError'>, 'FileExistsError': <class 'FileExistsError'>, 'FileNotFoundError': <class 'FileNotFoundError'>, 'IsADirectoryError': <class 'IsADirectoryError'>, 'NotADirectoryError': <class 'NotADirectoryError'>, 'InterruptedError': <class 'InterruptedError'>, 'PermissionError': <class 'PermissionError'>, 'ProcessLookupError': <class 'ProcessLookupError'>, 'TimeoutError': <class 'TimeoutError'>, 'open': , 'quit': Use quit() or Ctrl-D (i.e. EOF) to exit, 'exit': Use exit() or Ctrl-D (i.e. EOF) to exit, 'copyright': Copyright (c) 2001-2019 Python Software Foundation. All Rights Reserved. Copyright (c) 2000 BeOpen.com. All Rights Reserved. Copyright (c) 1995-2001 Corporation for National Research Initiatives. All Rights Reserved. Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam. All Rights Reserved., 'credits': Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands for supporting Python development. See www.python.org for more information., 'license': Type license() to see the full license text, 'help': Type help() for interactive help, or help(object) for help about object., 'pybind11_internals_v3_gcc_libstdcpp_cxxabi1002': <capsule object NULL at 0x7f0c54f4c5d0>, 'pybind11_internals_v3__': <capsule object NULL at 0x7f0c47298b70>}, 'np': <module 'numpy' from '/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/numpy/init__.py'>, 'sys': <module 'sys' (built-in)>, 'os': <module 'os' from '/usr/lib/python3.6/os.py'>, 'EnvConfig': <class 'gym_collision_avoidance.envs.config.Config'>, 'Train': <class 'Config.Train'>, 'TrainPhase1': <class 'Config.TrainPhase1'>, 'TrainPhase2': <class 'Config.TrainPhase2'>, 'TrainRegression': <class 'Config.TrainRegression'> }

I think it the config objtct get loaded from the config.py in the same repo with train.sh: rl_collision_avoidance/ga3c/GA3C/Config.py and there is no such attribute MAX_QUEUE_SIZE in the result.

BingHan0458 commented 3 years ago

I'm sorry, I still don't know how to solve the error above and I need your help, thank you very much!

mfe7 commented 3 years ago

ok, thanks, that indicates that the config isn't being loaded from the right place. for me, in Server.py when I add print(Config.__dict__) I get:

{'MAX_NUM_AGENTS_IN_ENVIRONMENT': 4, 'MAX_NUM_AGENTS_TO_SIM': 4, 'STATES_IN_OBS': ['is_learning', 'num_other_agents', 'dist_to_goal', 'heading_ego_frame', 'pref_speed', 'radius', 'other_agents_states'], 'STATES_NOT_USED_IN_POLICY': ['is_learning'], 'MULTI_AGENT_ARCH_RNN': 0, 'MULTI_AGENT_ARCH_WEIGHT_SHARING': 1, 'MULTI_AGENT_ARCH_LASERSCAN': 2, 'MULTI_AGENT_ARCH': 0, 'MAX_NUM_OTHER_AGENTS_OBSERVED': 3, 'COLLISION_AVOIDANCE': True, 'continuous': 0, 'discrete': 1, 'ACTION_SPACE_TYPE': 0, 'ANIMATE_EPISODES': False, 'SHOW_EPISODE_PLOTS': False, 'SAVE_EPISODE_PLOTS': False, 'PLOT_CIRCLES_ALONG_TRAJ': True, 'ANIMATION_PERIOD_STEPS': 5, 'PLT_LIMITS': None, 'PLT_FIG_SIZE': (10, 8), 'USE_STATIC_MAP': False, 'TRAIN_MODE': True, 'PLAY_MODE': False, 'EVALUATE_MODE': False, 'REWARD_AT_GOAL': 1.0, 'REWARD_COLLISION_WITH_AGENT': -0.25, 'REWARD_COLLISION_WITH_WALL': -0.25, 'REWARD_GETTING_CLOSE': -0.1, 'REWARD_ENTERED_NORM_ZONE': -0.05, 'REWARD_TIME_STEP': 0.0, 'REWARD_WIGGLY_BEHAVIOR': 0.0, 'WIGGLY_BEHAVIOR_THRESHOLD': inf, 'COLLISION_DIST': 0.0, 'GETTING_CLOSE_RANGE': 0.2, 'SOCIAL_NORMS': 'none', 'DT': 0.2, 'NEAR_GOAL_THRESHOLD': 0.2, 'MAX_TIME_RATIO': 2.0, 'TEST_CASE_FN': 'get_testcase_random', 'TEST_CASE_ARGS': {'policy_to_ensure': 'learning_ga3c', 'policies': ['noncoop', 'learning_ga3c', 'static'], 'policy_distr': [0.05, 0.9, 0.05], 'speed_bnds': [0.5, 2.0], 'radius_bnds': [0.2, 0.8], 'side_length': [{'num_agents': [0, 5], 'side_length': [4, 5]}, {'num_agents': [5, inf], 'side_length': [6, 8]}]}, 'MAX_NUM_OTHER_AGENTS_IN_ENVIRONMENT': 3, 'PLOT_EVERY_N_EPISODES': 100, 'SENSING_HORIZON': inf, 'LASERSCAN_LENGTH': 512, 'LASERSCAN_NUM_PAST': 3, 'NUM_STEPS_IN_OBS_HISTORY': 1, 'NUM_PAST_ACTIONS_IN_STATE': 0, 'RVO_TIME_HORIZON': 5.0, 'RVO_COLLAB_COEFF': 0.5, 'RVO_ANTI_COLLAB_T': 1.0, 'TRAIN_SINGLE_AGENT': False, 'STATE_INFO_DICT': {'dist_to_goal': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [-inf, inf], 'attr': 'get_agent_data("dist_to_goal")', 'std': array([5.], dtype=float32), 'mean': array([0.], dtype=float32)}, 'radius': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [0, inf], 'attr': 'get_agent_data("radius")', 'std': array([1.], dtype=float32), 'mean': array([0.5], dtype=float32)}, 'heading_ego_frame': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [-3.141592653589793, 3.141592653589793], 'attr': 'get_agent_data("heading_ego_frame")', 'std': array([3.14], dtype=float32), 'mean': array([0.], dtype=float32)}, 'pref_speed': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [0, inf], 'attr': 'get_agent_data("pref_speed")', 'std': array([1.], dtype=float32), 'mean': array([1.], dtype=float32)}, 'num_other_agents': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [0, inf], 'attr': 'get_agent_data("num_other_agents_observed")', 'std': array([1.], dtype=float32), 'mean': array([1.], dtype=float32)}, 'other_agent_states': {'dtype': <class 'numpy.float32'>, 'size': 7, 'bounds': [-inf, inf], 'attr': 'get_agent_data("other_agent_states")', 'std': array([5., 5., 1., 1., 1., 5., 1.], dtype=float32), 'mean': array([0. , 0. , 0. , 0. , 0.5, 0. , 1. ], dtype=float32)}, 'other_agents_states': {'dtype': <class 'numpy.float32'>, 'size': (3, 7), 'bounds': [-inf, inf], 'attr': 'get_sensor_data("other_agents_states")', 'std': array([[5., 5., 1., 1., 1., 5., 1.],
       [5., 5., 1., 1., 1., 5., 1.],
       [5., 5., 1., 1., 1., 5., 1.]], dtype=float32), 'mean': array([[0. , 0. , 0. , 0. , 0.5, 0. , 1. ],
       [0. , 0. , 0. , 0. , 0.5, 0. , 1. ],
       [0. , 0. , 0. , 0. , 0.5, 0. , 1. ]], dtype=float32)}, 'laserscan': {'dtype': <class 'numpy.float32'>, 'size': (3, 512), 'bounds': [0.0, 6.0], 'attr': 'get_sensor_data("laserscan")', 'std': array([[5., 5., 5., ..., 5., 5., 5.],
       [5., 5., 5., ..., 5., 5., 5.],
       [5., 5., 5., ..., 5., 5., 5.]], dtype=float32), 'mean': array([[5., 5., 5., ..., 5., 5., 5.],
       [5., 5., 5., ..., 5., 5., 5.],
       [5., 5., 5., ..., 5., 5., 5.]], dtype=float32)}, 'is_learning': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [0.0, 1.0], 'attr': 'get_agent_data_equiv("policy.str", "learning")'}, 'other_agents_states_encoded': {'dtype': <class 'numpy.float32'>, 'size': 100.0, 'bounds': [0.0, 1.0], 'attr': 'get_sensor_data("other_agents_states_encoded")'}}, 'MEAN_OBS': {'num_other_agents': array([1.], dtype=float32), 'dist_to_goal': array([0.], dtype=float32), 'heading_ego_frame': array([0.], dtype=float32), 'pref_speed': array([1.], dtype=float32), 'radius': array([0.5], dtype=float32), 'other_agents_states': array([[0. , 0. , 0. , 0. , 0.5, 0. , 1. ],
       [0. , 0. , 0. , 0. , 0.5, 0. , 1. ],
       [0. , 0. , 0. , 0. , 0.5, 0. , 1. ]], dtype=float32)}, 'STD_OBS': {'num_other_agents': array([1.], dtype=float32), 'dist_to_goal': array([5.], dtype=float32), 'heading_ego_frame': array([3.14], dtype=float32), 'pref_speed': array([1.], dtype=float32), 'radius': array([1.], dtype=float32), 'other_agents_states': array([[5., 5., 1., 1., 1., 5., 1.],
       [5., 5., 1., 1., 1., 5., 1.],
       [5., 5., 1., 1., 1., 5., 1.]], dtype=float32)}, 'AGENT_SORTING_METHOD': 'closest_first', 'game_grid': 0, 'game_ale': 1, 'game_collision_avoidance': 2, 'GAME_CHOICE': 2, 'USE_WANDB': False, 'WANDB_PROJECT_NAME': 'ga3c_cadrl', 'DEBUG': False, 'RANDOM_SEED_1000': 0, 'USE_IMAGE': False, 'NN_INPUT_AVG_VECTOR': array([1. , 0. , 0. , 1. , 0.5, 0. , 0. , 0. , 0. , 0.5, 0. , 1. , 0. ,
       0. , 0. , 0. , 0.5, 0. , 1. , 0. , 0. , 0. , 0. , 0.5, 0. , 1. ]), 'NN_INPUT_STD_VECTOR': array([1.  , 5.  , 3.14, 1.  , 1.  , 5.  , 5.  , 1.  , 1.  , 1.  , 5.  ,
       1.  , 5.  , 5.  , 1.  , 1.  , 1.  , 5.  , 1.  , 5.  , 5.  , 1.  ,
       1.  , 1.  , 5.  , 1.  ]), 'NN_INPUT_SIZE': 26, 'FIRST_STATE_INDEX': 1, 'HOST_AGENT_OBSERVATION_LENGTH': 4, 'OTHER_AGENT_OBSERVATION_LENGTH': 7, 'OTHER_AGENT_FULL_OBSERVATION_LENGTH': 7, 'HOST_AGENT_STATE_SIZE': 4, 'NUM_ACTIONS': 11, 'LOAD_RL_THEN_TRAIN_RL': 0, 'TRAIN_ONLY_REGRESSION': 1, 'LOAD_REGRESSION_THEN_TRAIN_RL': 2, 'NET_ARCH': 'NetworkVP_rnn', 'ALL_ARCHS': ['NetworkVP_rnn'], 'NORMALIZE_INPUT': True, 'USE_DROPOUT': False, 'USE_REGULARIZATION': True, 'AGENTS': 32, 'PREDICTORS': 2, 'TRAINERS': 2, 'DEVICE': '/cpu:0', 'DYNAMIC_SETTINGS': False, 'DYNAMIC_SETTINGS_STEP_WAIT': 20, 'DYNAMIC_SETTINGS_INITIAL_WAIT': 10, 'DISCOUNT': 0.97, 'TIME_MAX': 20, 'MAX_QUEUE_SIZE': 100, 'PREDICTION_BATCH_SIZE': 128, 'MIN_POLICY': 0.0, 'OPT_RMSPROP': 0, 'OPT_ADAM': 1, 'OPTIMIZER': 1, 'LEARNING_RATE_RL_START': 2e-05, 'LEARNING_RATE_RL_END': 2e-05, 'RMSPROP_DECAY': 0.99, 'RMSPROP_MOMENTUM': 0.0, 'RMSPROP_EPSILON': 0.1, 'BETA_START': 0.0001, 'BETA_END': 0.0001, 'USE_GRAD_CLIP': False, 'GRAD_CLIP_NORM': 40.0, 'LOG_EPSILON': 1e-06, 'TRAINING_MIN_BATCH_SIZE': 100, 'TENSORBOARD': True, 'TENSORBOARD_UPDATE_FREQUENCY': 100, 'SAVE_MODELS': True, 'SAVE_FREQUENCY': 50000, 'SPECIAL_EPISODES_TO_SAVE': [1490000, 1500000], 'PRINT_STATS_FREQUENCY': 1, 'STAT_ROLLING_MEAN_WINDOW': 1000, 'RESULTS_FILENAME': 'results.txt', 'NETWORK_NAME': 'network', 'TRAIN_VERSION': 2, 'LOAD_FROM_WANDB_RUN_ID': 'run-rnn', 'EPISODE_NUMBER_TO_LOAD': 0, 'EPISODES': 1500000, 'ANNEALING_EPISODE_COUNT': 1500000}

which includes MAX_QUEUE_SIZE as desired.

The gym_collision_avoidance/envs/__init__.py file is where the Config class is instantiated, with some hacking that allows users to choose which config class to use via 2 environment variables. For example, in the train.sh script, we have default values for these env variables (the path to the RL config.py file, and the name of the class within that file).

I'm pretty sure something has gone wrong in that __init__.py, and you could try to debug that file (the last line is Config = config_class(), so you could check that Config object has the right attributes). Right now it seems like your Config object is referring to the whole Config.py python file as an object.

BingHan0458 commented 3 years ago

Thank you very much! This bug has been solved due to the import Config. But there is also another error when running this command: ./train.sh TrainPhase1 as follows:

Entered virtualenv.
--------------------------------------------------------------------------------------------------------
Running GA3C-CADRL gym-collision-avoidance training script (TrainPhase1)
--------------------------------------------------------------------------------------------------------
[Server] Making model...
[Server] Loading Regression Model then training RL.
[NetworkVPCore] Loading checkpoint file: /home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/checkpoints/regression/wandb/run-rnn/checkpoints/network_00000000
Traceback (most recent call last):
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: not an sstable (bad magic number)
     [[{{node save/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Run.py", line 77, in <module>
    Server().main()
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Server.py", line 72, in __init__
    self.model.load(learning_method='regression')
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/NetworkVPCore.py", line 249, in load
    self.saver.restore(self.sess, filename)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1290, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: not an sstable (bad magic number)
     [[node save/RestoreV2 (defined at /home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/RestoreV2':
  File "Run.py", line 77, in <module>
    Server().main()
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Server.py", line 63, in __init__
    self.model = self.make_model()
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Server.py", line 89, in make_model
    return globals()[Config.NET_ARCH](Config.DEVICE, Config.NETWORK_NAME, self.num_actions) # TODO can probably change Config.NETWORK_NAME to Config.NET_ARCH
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/NetworkVP_rnn.py", line 41, in __init__
    super(self.__class__, self).__init__(device, model_name, num_actions)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/NetworkVPCore.py", line 60, in __init__
    self.saver = tf.compat.v1.train.Saver({var.name: var for var in vars}, max_to_keep=0)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

I guess it is due to the checkpoints with episode, and does the network_00000000 is the trained network or the network we need to train? I really don't have any solution about it. how to solved it? could you help me? Thank you very much!

mfe7 commented 3 years ago

network_00000000 has a network that was trained with regression only, so it provides a good starting point for training with RL. It does seem like this is an issue with loading that checkpoint, and maybe there are some clues here or here?