I tried to run the cluster_demo.py on EC2. The instance starts fine but gets terminated shortly after. I get the following traceback in the stdout.log
sync initiatedlog sync initiatedRunning in dockerI tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locallyI tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locallyI tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locallyI tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 83526cf8e682I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this programI tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/versionI tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1065] LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1066] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libcuda.so.1: cannot open shared object file: No such file or directoryI tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally[32musing seed 1[0m2018-05-31 09:18:27.844271 UTC | Setting seed to 1[32musing seed 1[0m/opt/conda/envs/rllab3/lib/python3.5/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module."downsample module has been moved to the theano.tensor.signal.pool module.")Traceback (most recent call last):File "/root/code/rllab/scripts/run_experiment_lite.py", line 137, in <module>run_experiment(sys.argv)File "/root/code/rllab/scripts/run_experiment_lite.py", line 120, in run_experimentmethod_call = cloudpickle.loads(base64.b64decode(args.args_data))File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 800, in _make_skel_funcclosure = _reconstruct_closure(closures) if closures else NoneFile "/opt/conda/envs/rllab3/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 792, in _reconstruct_closurereturn tuple([_make_cell(v) for v in values])TypeError: 'int' object is not iterable
Any help? If additional information is necessary, I am ready to provide it.
I tried to run the
cluster_demo.py
on EC2. The instance starts fine but gets terminated shortly after. I get the following traceback in thestdout.log
sync initiated
log sync initiated
Running in docker
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 83526cf8e682
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1065] LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1066] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
[32musing seed 1[0m
2018-05-31 09:18:27.844271 UTC | Setting seed to 1
[32musing seed 1[0m
/opt/conda/envs/rllab3/lib/python3.5/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
"downsample module has been moved to the theano.tensor.signal.pool module.")
Traceback (most recent call last):
File "/root/code/rllab/scripts/run_experiment_lite.py", line 137, in <module>
run_experiment(sys.argv)
File "/root/code/rllab/scripts/run_experiment_lite.py", line 120, in run_experiment
method_call = cloudpickle.loads(base64.b64decode(args.args_data))
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 800, in _make_skel_func
closure = _reconstruct_closure(closures) if closures else None
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 792, in _reconstruct_closure
return tuple([_make_cell(v) for v in values])
TypeError: 'int' object is not iterable
Any help? If additional information is necessary, I am ready to provide it.