uber-research / deep-neuroevolution

Deep Neuroevolution
Other
1.63k stars 301 forks source link

Building gym_tensorflow #10

Open Nostrademous opened 6 years ago

Nostrademous commented 6 years ago

Getting the following errors from a fresh git clone following README instructions: SIDE NOTE: I'm installing this on an Ubuntu OS using Windows Subsystem for Linux

(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$ make clean
rm -rf gym_tensorflow.so
(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$ make
g++ -std=c++11 -shared -fPIC -I/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include -I/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/external/nsync/public -L/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/core -D_GLIBCXX_USE_CXX11_ABI=0 -O2 -DGOOGLE_CUDA=1 -Wl,-rpath=/build .//*.cpp .//ops/*.cpp -ltensorflow_framework -o gym_tensorflow.so
In file included from .//tf_env.cpp:22:0:
.//tf_env.cpp: In member function ‘virtual void EnvironmentMakeOp::Compute(tensorflow::OpKernelContext*)’:
.//tf_env.cpp:102:69: error: ‘MakeResourceHandleToOutput’ was not declared in this scope
                                     MakeTypeIndex<BaseEnvironment>()));
                                                                     ^
/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/framework/op_kernel.h:1309:29: note: in definition of macro ‘OP_REQUIRES_OK’
     ::tensorflow::Status _s(STATUS);    \
                             ^
In file included from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/platform/mutex.h:25:0,
                 from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/dso_loader.h:29,
                 from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/default/stream_executor.h:25,
                 from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/stream_executor.h:24,
                 from .//ops/indexedmatmul.cpp:20:
/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/platform/default/mutex.h:32:19: error: ‘tensorflow::tf_shared_lock’ has not been declared
 using tensorflow::tf_shared_lock;
                   ^
In file included from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/default/stream_executor.h:31:0,
                 from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/stream_executor.h:24,
                 from .//ops/indexedmatmul.cpp:20:
/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/stream.h: In member function ‘bool perftools::gputools::Stream::InErrorState() const’:
/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/stream.h:2005:5: error: ‘tf_shared_lock’ was not declared in this scope
     tf_shared_lock lock{mu_};
     ^
Makefile:45: recipe for target 'gym_tensorflow.so' failed
make: *** [gym_tensorflow.so] Error 1

NOTE: I do have slightly update version of some of the python packages, but I don't think that's the errors I'm hitting. Here is the pip list anyways:

(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$ pip list
Package        Version
-------------- -----------
absl-py        0.2.0
appdirs        1.4.3
astor          0.6.2
bleach         1.5.0
click          6.7
gast           0.2.0
grpcio         1.11.0
gym            0.9.4
h5py           2.7.0
html5lib       0.9999999
Markdown       2.6.11
mujoco-py      0.5.7
numpy          1.14.3
packaging      16.8
pip            10.0.1
protobuf       3.5.2.post1
pyglet         1.2.4
PyOpenGL       3.1.0
pyparsing      2.2.0
redis          2.10.5
requests       2.14.2
setuptools     28.8.0
six            1.11.0
tensorboard    1.8.0
tensorflow     0.12.1
tensorflow-gpu 1.8.0
termcolor      1.1.0
Werkzeug       0.14.1
wheel          0.31.0
(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$
fps7806 commented 6 years ago

I wonder if it has anything do to with the two different tensorflow versions you have installed. Try pip uninstall tensorflow and keep tensorflow-gpu as is

Nostrademous commented 6 years ago

Tried that, no luck.

(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$ make
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'sysconfig'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'sysconfig'
g++ -std=c++11 -shared -fPIC -I -I/external/nsync/public -L -D_GLIBCXX_USE_CXX11_ABI=0 -O2 -DGOOGLE_CUDA=1 -Wl,-rpath=/build .//*.cpp .//ops/*.cpp -ltensorflow_framework -o gym_tensorflow.so
.//tf_env.cpp:22:49: fatal error: tensorflow/core/framework/op_kernel.h: No such file or directory
compilation terminated.
.//ops/indexedmatmul.cpp:7:42: fatal error: tensorflow/core/framework/op.h: No such file or directory
compilation terminated.
Makefile:45: recipe for target 'gym_tensorflow.so' failed
make: *** [gym_tensorflow.so] Error 1
Nostrademous commented 6 years ago

I did get the gym to compile after reinstalling the latest version of tensorflow (1.8.0) like I have with tensorflow-gpu. The previous 0.12 version was per the top-level requirements pip in the repo which I guess is obsolete.

However, as compile now, I can't get the ga.py or es.py to work b/c apparently even though I compiled the gym without ALE support, those files require ALE support.

(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation$ python es.py configurations/es_atari_config.json
/home/nostrademous/ML/env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
05/12/2018 09:29:42 AM {
    "episode_cutoff_mode": 5000,
    "game": "frostbite",
    "l2coeff": 0.005,
    "model": "ModelVirtualBN",
    "mutation_power": 0.02,
    "num_test_episodes": 200,
    "num_validation_episodes": 30,
    "optimizer": {
        "args": {
            "stepsize": 0.01
        },
        "type": "adam"
    },
    "population_size": 5000,
    "return_proc_mode": "centered_rank",
    "timesteps": 250000000.0
}
05/12/2018 09:29:42 AM Logging to: /tmp/tmp9g4m8tav
Traceback (most recent call last):
  File "es.py", line 293, in <module>
    main(**exp)
  File "es.py", line 148, in main
    worker = ConcurrentWorkers(make_env, Model, batch_size=64)
  File "/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/neuroevolution/concurrent_worker.py", line 135, in __init__
    ref_batch = gym_tensorflow.get_ref_batch(make_env_f, sess, 128)
  File "/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/__init__.py", line 18, in get_ref_batch
    env = make_env_f(1)
  File "es.py", line 147, in make_env
    return gym_tensorflow.make(game=exp["game"], batch_size=b)
  File "/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/__init__.py", line 11, in make
    return StackFramesWrapper(atari.AtariEnv(game, batch_size, *args, **kwargs))
  File "/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari/__init__.py", line 8, in __init__
    raise NotImplementedError("gym_tensorflow was not compiled with ALE support.")
NotImplementedError: gym_tensorflow was not compiled with ALE support.
(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation$

However, when I try to enable ALE in the Makefile and make the gym I get the following errors:

(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$ make
/home/nostrademous/ML/env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/home/nostrademous/ML/env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
g++ -std=c++11 -shared -fPIC -I/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include -I/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/external/nsync/public -L/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow -D_GLIBCXX_USE_CXX11_ABI=0 -O2 -DGOOGLE_CUDA=1 -I/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/src -I/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/src/controllers -I/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/src/os_dependent -I/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/src/environment -I/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/src/external -L/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/build -Wl,-rpath=/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/build .//*.cpp .//ops/*.cpp .//atari/*.cpp -ltensorflow_framework -lale -o gym_tensorflow.so
.//atari/tf_atari.cpp:3:29: fatal error: ale_interface.hpp: No such file or directory
compilation terminated.
Makefile:45: recipe for target 'gym_tensorflow.so' failed
make: *** [gym_tensorflow.so] Error 1
Alro10 commented 6 years ago

Hi everyone! I got the same issue, I think it depends on which version of gcc this repository uses to build gpu_implementation.

I found the following references:

https://github.com/Zardinality/TF-deformable-conv/issues/1

https://github.com/tensorflow/tensorflow/issues/15002

fps7806 commented 6 years ago

The experiments we included are for the Atari games which require ALE support, you can follow these instructions to compile. We are in the process of adding MuJoCo support, but without ALE the only environment available is the hard maze.

ylddd commented 6 years ago

Hello, have your problem been solved? I have the same trouble with you....

zhan0903 commented 6 years ago

Hi, everyone, I met an issue: "g++: error: unrecognized command line option ‘-Wl’", any help?

benjamin22-314 commented 6 years ago

Hi, everyone, I met an issue: "g++: error: unrecognized command line option ‘-Wl’", any help?

I'm having the same issue. Did you work it out @zhan0903 ?

benjamin22-314 commented 6 years ago

Hi, everyone, I met an issue: "g++: error: unrecognized command line option ‘-Wl’", any help?

Hi @zhan0903 , I think that issue is from a typo in the 'deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile'.

line 30 is missing a "," I think it should be FLAGS+= -Wl,-rpath=$(ALE)/build instead of FLAGS+= -Wl -rpath=$(ALE)/build

youshaox commented 5 years ago

Hi the FLAGS+= -Wl,-rpath=$(ALE)/build does not work. I am still encounter the same error. Have your solved this issue? @Nostrademous @fps7806 @ylddd

matthewzar commented 5 years ago

A slight adaptation of the changes suggested by @BenjaminPhillips22 fixed it on my Linux Mint instance: FLAGS+= -Wl,-rpath,$(ALE)/build

Notice there are no spaces, and 1 extra comma.

lisun-ai commented 5 years ago

Hi, everyone, I met an issue: "g++: error: unrecognized command line option ‘-Wl’", any help?

Hi @zhan0903 , I think that issue is from a typo in the 'deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile'.

line 30 is missing a "," I think it should be FLAGS+= -Wl,-rpath=$(ALE)/build instead of FLAGS+= -Wl -rpath=$(ALE)/build

Compile successful on Ubuntu 16.04, Thanks!

youshaox commented 5 years ago

@Nostrademous I have changed it to "FLAGS+= -Wl,-rpath=$(ALE)/build" and successfully make the gym_tensorflow. But have you guys solved "gym_tensorflow was not compiled with ALE support" error? I have been stuck here for a long time.

Error log:

Traceback (most recent call last): File "es.py", line 293, in main(*exp) File "es.py", line 148, in main worker = ConcurrentWorkers(make_env, Model, batch_size=64) File "/home/shawn/workspace/test/deep-neuroevolution/gpu_implementation/neuroevolution/concurrent_worker.py", line 135, in init ref_batch = gym_tensorflow.get_ref_batch(make_env_f, sess, 128) File "/home/shawn/workspace/test/deep-neuroevolution/gpu_implementation/gym_tensorflow/init.py", line 18, in get_ref_batch env = make_env_f(1) File "es.py", line 147, in make_env return gym_tensorflow.make(game=exp["game"], batch_size=b) File "/home/shawn/workspace/test/deep-neuroevolution/gpu_implementation/gym_tensorflow/init.py", line 11, in make return StackFramesWrapper(atari.AtariEnv(game, batch_size, args, **kwargs)) File "/home/shawn/workspace/test/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari/init.py", line 8, in init raise NotImplementedError("gym_tensorflow was not compiled with ALE support.") NotImplementedError: gym_tensorflow was not compiled with ALE support.

denis-xiao commented 5 years ago

@Nostrademous @youshaox I got the same problem "gym_tensorflow was not compiled with ALE support" error. Have you ever solved this problem?

fps7806 commented 5 years ago

@Nostrademous @youshaox I got the same problem "gym_tensorflow was not compiled with ALE support" error. Have you ever solved this problem?

That can be solved if you enable USE_ALE option: https://github.com/uber-research/deep-neuroevolution/blob/master/gpu_implementation/gym_tensorflow/Makefile#L2

Instructions to use ALE are here: https://github.com/uber-research/deep-neuroevolution/tree/master/gpu_implementation/gym_tensorflow/atari

youshaox commented 5 years ago

I have already set USE_ALE=1 in the file "deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile". USE_SDL := 0 USE_ALE := 1 USE_GPU := 1

Still, i get the above error.

Following the instructions in https://github.com/uber-research/deep-neuroevolution/tree/master/gpu_implementation/gym_tensorflow/atari:

  1. git clone https://github.com/fps7806/atari-py.git into the directory "deep-neuroevolution/gpu_implementation/gym_tensorflow".
  2. cd ./atari-py && make
  3. set USE_ALE := 1 in the file "deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile".
  4. cd ./gym_tensorflow && make
  5. python es.py configurations/es_atari_config.json I still get the above error.

Error log:

2019-04-27 08:02:27.225223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8790 MB memory) -> physical GPU (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0, compute capability: 3.5) Traceback (most recent call last): File "es.py", line 293, in main(*exp) File "es.py", line 148, in main worker = ConcurrentWorkers(make_env, Model, batch_size=64) File "/home/shawn/workspace/research/deep-neuroevolution/gpu_implementation/neuroevolution/concurrent_worker.py", line 135, in init ref_batch = gym_tensorflow.get_ref_batch(make_env_f, sess, 128) File "/home/shawn/workspace/research/deep-neuroevolution/gpu_implementation/gym_tensorflow/init.py", line 18, in get_ref_batch env = make_env_f(1) File "es.py", line 147, in make_env return gym_tensorflow.make(game=exp["game"], batch_size=b) File "/home/shawn/workspace/research/deep-neuroevolution/gpu_implementation/gym_tensorflow/init.py", line 11, in make return StackFramesWrapper(atari.AtariEnv(game, batch_size, args, **kwargs)) File "/home/shawn/workspace/research/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari/init.py", line 8, in init raise NotImplementedError("gym_tensorflow was not compiled with ALE support.") NotImplementedError: gym_tensorflow was not compiled with ALE support.

fps7806 commented 5 years ago

I have already set USE_ALE=1 in the file "deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile". USE_SDL := 0 USE_ALE := 1 USE_GPU := 1

Still get the above error.

Interesting, can you try running cd ./gym_tensorflow && make clean && make

vijnasu commented 4 years ago

Running the python ga.py -c configurations/ga_atari_config.json -o out gives the following error. I tried most of the suggestions discussed above.

tensorflow.python.framework.errors_impl.NotFoundError: /home/administrator/Hands-on-Neuroevolution-with-Python/Chapter10/gym_tensorflow/gym_tensorflow.so: undefined symbol: _ZN10tensorflow11ResourceMgr8DoDeleteERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10typeindexS8

thisisjasleen commented 4 years ago

Hi, everyone, I met an issue: "g++: error: unrecognized command line option ‘-Wl’", any help?

Hi @zhan0903 , I think that issue is from a typo in the 'deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile'.

line 30 is missing a "," I think it should be FLAGS+= -Wl,-rpath=$(ALE)/build instead of FLAGS+= -Wl -rpath=$(ALE)/build

I am still having some trouble with the same error. Does somebody know how to resolve it?