openai / universe-starter-agent

A starter agent that can solve a number of universe environments.
MIT License
1.1k stars 318 forks source link

Not work on OS 10.11 #95

Closed goodnamegood closed 6 years ago

goodnamegood commented 7 years ago

I follow every instructions but it seems like the program does not run at all on my computer. Point my browser to http://localhost:12345 but I saw nothing meaningful...

tlbtlbtlb commented 7 years ago

It works for many people on OSX 10.11. Can you describe in more detail what is wrong?

goodnamegood commented 7 years ago

My computer does not have NVIDIA GPU installed, and when I follow the instructions to run, the computer does not return any error but simply there is no program running there.

tlbtlbtlb commented 7 years ago

No GPU is needed.

This video shows a walkthrough of how to run it: https://www.youtube.com/watch?v=XI-I9i_GzIw

goodnamegood commented 7 years ago

Thank you! So I guess simply following what the README.md said does not work? We still need some debug?

tlbtlbtlb commented 7 years ago

Following the README works for many people. Can you post transcripts or screenshots explaining what happened?

On Wed, May 10, 2017 at 11:25 AM, Harry Pan notifications@github.com wrote:

Thank you! So I guess simply following what the README.md said does not work? We still need some debug?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openai/universe-starter-agent/issues/95#issuecomment-300571547, or mute the thread https://github.com/notifications/unsubscribe-auth/AANZdFbRq9U5R2vOIpjP6rdZ1PhcIS4kks5r4gEhgaJpZM4NWFDR .

-- Trevor Blackwell tlb@openai.com 650 776 7870

goodnamegood commented 7 years ago

Thanks for your kind reply! Here is the return after running "python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong --visualise"

:~/git/universe-starter-agent$ python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong --visualise Executing the following commands: mkdir -p /tmp/pong echo /Users/harrypann/miniconda3/envs/universe-starter-agent/bin/python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong --visualise > /tmp/pong/cmd.sh kill $( lsof -i:12345 -t ) > /dev/null 2>&1 kill $( lsof -i:12222-12224 -t ) > /dev/null 2>&1 tmux kill-session -t a3c tmux new-session -s a3c -n ps -d bash tmux new-window -t a3c -n w-0 bash tmux new-window -t a3c -n w-1 bash tmux new-window -t a3c -n tb bash tmux new-window -t a3c -n htop bash sleep 1 tmux send-keys -t a3c:ps 'CUDA_VISIBLE_DEVICES= /Users/harrypann/miniconda3/envs/universe-starter-agent/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --visualise --job-name ps' Enter tmux send-keys -t a3c:w-0 'CUDA_VISIBLE_DEVICES= /Users/harrypann/miniconda3/envs/universe-starter-agent/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --visualise --job-name worker --task 0 --remotes 1' Enter tmux send-keys -t a3c:w-1 'CUDA_VISIBLE_DEVICES= /Users/harrypann/miniconda3/envs/universe-starter-agent/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --visualise --job-name worker --task 1 --remotes 1' Enter tmux send-keys -t a3c:tb 'tensorboard --logdir /tmp/pong --port 12345' Enter tmux send-keys -t a3c:htop htop Enter

Use tmux attach -t a3c to watch process output Use tmux kill-session -t a3c to kill the job

tlbtlbtlb commented 7 years ago

That looks like it's working. Use tmux attach -t a3c to see progress within tmux (man tmux explains how to use it) and point your browser to http://localhost:15900 to see it playing in real time. Point your browser to http://localhost:12345 to see graphs of progress in Tensorboard. It takes several hours for it to learn to play pong, so don't expect much in the first few minutes.

On Wed, May 10, 2017 at 12:03 PM, Harry Pan notifications@github.com wrote:

Thanks for your kind reply! Here is the return after running "python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong --visualise"

:~/git/universe-starter-agent$ python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong --visualise Executing the following commands: mkdir -p /tmp/pong echo /Users/harrypann/miniconda3/envs/universe-starter-agent/bin/python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong --visualise > /tmp/pong/cmd.sh kill $( lsof -i:12345 -t ) > /dev/null 2>&1 kill $( lsof -i:12222-12224 -t ) > /dev/null 2>&1 tmux kill-session -t a3c tmux new-session -s a3c -n ps -d bash tmux new-window -t a3c -n w-0 bash tmux new-window -t a3c -n w-1 bash tmux new-window -t a3c -n tb bash tmux new-window -t a3c -n htop bash sleep 1 tmux send-keys -t a3c:ps 'CUDA_VISIBLE_DEVICES= /Users/harrypann/miniconda3/envs/universe-starter-agent/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --visualise --job-name ps' Enter tmux send-keys -t a3c:w-0 'CUDA_VISIBLE_DEVICES= /Users/harrypann/miniconda3/envs/universe-starter-agent/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --visualise --job-name worker --task 0 --remotes 1' Enter tmux send-keys -t a3c:w-1 'CUDA_VISIBLE_DEVICES= /Users/harrypann/miniconda3/envs/universe-starter-agent/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --visualise --job-name worker --task 1 --remotes 1' Enter tmux send-keys -t a3c:tb 'tensorboard --logdir /tmp/pong --port 12345' Enter tmux send-keys -t a3c:htop htop Enter

Use tmux attach -t a3c to watch process output Use tmux kill-session -t a3c to kill the job

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openai/universe-starter-agent/issues/95#issuecomment-300581879, or mute the thread https://github.com/notifications/unsubscribe-auth/AANZdJMvvthZKAwDAHH21BjIjb9s2U1xks5r4goMgaJpZM4NWFDR .

-- Trevor Blackwell tlb@openai.com 650 776 7870

goodnamegood commented 7 years ago

Thanks. I just transferred from OSX to Ubuntu 16.04 and it work on the Ubuntu but unfortunately I still can't see anything on OS X if I want to visualize the training. Thank you!

yoshavit commented 7 years ago

Hi there, I'm having the same issues, but on OS X 10.10 with Tensorflow 1.0. The output is identical to what's shown above, but Tensorboard displays nothing (seemingly no training being done). /tmp/pong/train_0 and train_1 are both empty, and don't seem to be populated. I'm pretty sure the processes aren't running? Any ideas on what I could do/where I should look? Thanks! -Yo

yoshavit commented 7 years ago

Just found the problem, in one of the worker.py 'worker' instances.

E0512 14:47:28.171308000 140735128658688 server_chttp2.c:159] {"created":"@1494614848.171249000","description":"No address added out of total 1 resolved","file":"external/grpc/src/core/ext/transport/chttp2/server/insecure/server_chttp2.c","file_line":125,"referenced_errors":[{"created":"@1494614848.171245000","description":"Failed to add port to server","file":"external/grpc/src/core/lib/iomgr/tcp_server_posix.c","file_line":634,"referenced_errors":[{"created":"@1494614848.171237000","description":"Unable to configure socket","fd":10,"file":"external/grpc/src/core/lib/iomgr/tcp_server_posix.c","file_line":355,"referenced_errors":[{"created":"@1494614848.171224000","description":"OS Error","errno":48,"file":"external/grpc/src/core/lib/iomgr/tcp_server_posix.c","file_line":331,"os_error":"Address already in use","syscall":"bind"}]}],"target_address":"ipv6:[::]:12223"}]} Traceback (most recent call last): File "worker.py", line 152, in <module> tf.app.run() File "/Users/yonadav/anaconda/envs/tensorflow3.5/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 44, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "worker.py", line 143, in main config=tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=2)) File "/Users/yonadav/anaconda/envs/tensorflow3.5/lib/python3.5/site-packages/tensorflow/python/training/server_lib.py", line 144, in __init__ self._server_def.SerializeToString(), status) File "/Users/yonadav/anaconda/envs/tensorflow3.5/lib/python3.5/contextlib.py", line 66, in __exit__ next(self.gen) File "/Users/yonadav/anaconda/envs/tensorflow3.5/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.UnknownError: Could not start gRPC server

This error is the same for both workers, minus a few minor changes in the initial block (starting with "created" and ending with "12223", or for the second worker "12224") In addition, when running just a single worker, the algorithm works and trains fine.

Any idea what could be causing this?

tlbtlbtlb commented 7 years ago

The real error is "Address already in use". Most likely, there are still worker processes running from a previous attempt. Kill them with killall python or with ps and kill.

yoshavit commented 7 years ago

Thank you! It works :)