mit-acl / rl_collision_avoidance

Training code for GA3C-CADRL algorithm (collision avoidance with deep RL)
119 stars 30 forks source link

No matching distribution found for tensorflow==1.15.2 (from gym-collision-avoidance===1.0.0) #4

Closed HanBing0802 closed 3 years ago

HanBing0802 commented 3 years ago

Hello, thanks for sharing your awesome work. I'm a beginner about this field and this is the first time I learn about it. I have been troubled by the above questions for a long time. I hope you can help me, thank you very much!

  1. when I run ./install.sh, I got some bug as follows: Obtaining file:///home/hb/catkin_ws/src/rl_collision_avoidance/gym-collision-avoidance Collecting tensorflow==1.15.2 (from gym-collision-avoidance===1.0.0) Could not find a version that satisfies the requirement tensorflow==1.15.2 (from gym-collision-avoidance===1.0.0) (from versions: 0.12.0rc0, 0.12.0rc1, 0.12.0, 0.12.1, 1.0.0, 1.0.1, 1.1.0rc0, 1.1.0rc1, 1.1.0rc2, 1.1.0, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.3.0rc0, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.4.0rc0, 1.4.0rc1, 1.4.0, 1.4.1, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.8.0rc0, 1.8.0rc1, 1.8.0, 1.9.0rc0, 1.9.0rc1, 1.9.0rc2, 1.9.0, 1.10.0rc0, 1.10.0rc1, 1.10.0, 1.10.1, 1.11.0rc0, 1.11.0rc1, 1.11.0rc2, 1.11.0, 1.12.0rc0, 1.12.0rc1, 1.12.0rc2, 1.12.0, 1.12.2, 1.12.3, 1.13.0rc0, 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 2.0.0a0, 2.0.0b0, 2.0.0b1) No matching distribution found for tensorflow==1.15.2 (from gym-collision-avoidance===1.0.0) I find it is due to python3 -m pip install -e $DIR in /rl_collision_avoidance/gym-collision-avoidance/install.sh. Now I use python3.5, tehsorflow1.15.2. And I guess this problem is due to inconsistent version? Is that ture?

  2. When I delete this code python3 -m pip install -e $DIR and run again, there is another bug as follows: `Collecting git+https://github.com/openai/baselines.git Cloning https://github.com/openai/baselines.git to /tmp/pip-0g8mhuix-build Complete output from command python setup.py egg_info: running egg_info creating pip-egg-info/baselines.egg-info writing dependency_links to pip-egg-info/baselines.egg-info/dependency_links.txt writing requirements to pip-egg-info/baselines.egg-info/requires.txt writing pip-egg-info/baselines.egg-info/PKG-INFO writing top-level names to pip-egg-info/baselines.egg-info/top_level.txt writing manifest file 'pip-egg-info/baselines.egg-info/SOURCES.txt' warning: manifest_maker: standard file '-c' not found reading manifest file 'pip-egg-info/baselines.egg-info/SOURCES.txt' writing manifest file 'pip-egg-info/baselines.egg-info/SOURCES.txt' Traceback (most recent call last): File "", line 1, in File "/tmp/pip-0g8mhuix-build/setup.py", line 58, in assert tf_pkg is not None, 'TensorFlow needed, of version above 1.4' AssertionError: TensorFlow needed, of version above 1.4

    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-0g8mhuix-build/`

mfe7 commented 3 years ago
  1. I think you're right that it is a compatibility issue between python and tensorflow versions. I think an earlier version of tensorflow should work fine (maybe try 1.14?). I would expect tensorflow 2 to have significant differences and it wouldn't work without a fair number of changes. Also, if you use a different python version (maybe 3.6, 3.7?) it might have tf 1.15.2? I have used pyenv successfully in the past to manage various python versions on my system.

  2. I am guessing that error is because deleting the line you mentioned would skip the step where tensorflow is installed. So then when the openai-baselines package is supposed to be installed, it is missing one of its dependencies (tensorflow).

HanBing0802 commented 3 years ago

Hello, thank you very much. I solved this problem by your proposal. but in training, when I run ./train.sh TrainPhase1, the result is as follows. I don't know where is its problem. I hope you can help me, thank you again! Entered virtualenv.

Running GA3C-CADRL gym-collision-avoidance training script (TrainPhase1)

[Server] Making model... [Server] Loading Regression Model then training RL. [NetworkVPCore] Loading checkpoint file: /home/hb/catkin_ws/src/rl_collision_avoidance/ga3c/GA3C/checkpoints/regression/wandb/run-rnn/checkpoints/network_00000000 Traceback (most recent call last): File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.DataLossError: not an sstable (bad magic number) [[{{node save/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "Run.py", line 75, in Server().main() File "/home/hb/catkin_ws/src/rl_collision_avoidance/ga3c/GA3C/Server.py", line 66, in init self.model.load(learning_method='regression') File "/home/hb/catkin_ws/src/rl_collision_avoidance/ga3c/GA3C/NetworkVPCore.py", line 246, in load self.saver.restore(self.sess, filename) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1290, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.DataLossError: not an sstable (bad magic number) [[node save/RestoreV2 (defined at /home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/RestoreV2': File "Run.py", line 75, in Server().main() File "/home/hb/catkin_ws/src/rl_collision_avoidance/ga3c/GA3C/Server.py", line 57, in init self.model = self.make_model() File "/home/hb/catkin_ws/src/rl_collision_avoidance/ga3c/GA3C/Server.py", line 83, in make_model return globals()[Config.NET_ARCH](Config.DEVICE, Config.NETWORK_NAME, self.num_actions) # TODO can probably change Config.NETWORK_NAME to Config.NET_ARCH File "/home/hb/catkin_ws/src/rl_collision_avoidance/ga3c/GA3C/NetworkVP_rnn.py", line 37, in init super(self.class, self).init(device, model_name, num_actions) File "/home/hb/catkin_ws/src/rl_collision_avoidance/ga3c/GA3C/NetworkVPCore.py", line 57, in init self.saver = tf.compat.v1.train.Saver({var.name: var for var in vars}, max_to_keep=0) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 828, in init self.build() File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 840, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build build_restore=build_restore) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal restore_sequentially, reshape) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps restore_sequentially) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2 name=name) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/home/hb/.pyenv/versions/3.6.5/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init self._traceback = tf_stack.extract_stack()

mfe7 commented 3 years ago

hmm looks like this error has to do with the model file that is being loaded. i am guessing it is trying to load one of the files that was trained via regression, and then trying to begin RL from that starting point? The file it's looking for is at /home/hb/catkin_ws/src/rl_collision_avoidance/ga3c/GA3C/checkpoints/regression/wandb/run-rnn/checkpoints/network_00000000 according to that log -- does that file exist on your machine?