submit-paper / Danzero_plus

31 stars 7 forks source link

learner.py Receiving FPS: 0.00, Consuming FPS: 0.00 #9

Open Cyclones-Y opened 8 months ago

Cyclones-Y commented 8 months ago

I trained the DMC model and ran the project directory actor_n,learner_n

  1. I have created one container each for learner and actor, and I have created a docker network - 172.15.15.1. I have also implemented ssh-less interoperability between the containers.
  2. The ip settings are all correct and able to communicate with each other.

When I leaner the docker container run the start.sh file in the learner_n directory: sshpass ssh root@172.15.15.5 "bash /yzm/Danzero_plus/actor_n/start.sh" nohup /usr/bin/python -u /yzm/Danzero_plus/learner_n/learner.py > /yzm/Danzero_plus/learner_n/learner_out.log 2>&1 &

I am able to start the game.py and actor.py files under the actor container, but why does my FPS output from the learner.py stay at 0? Running game.py outputs messages normally. This is the log of my actor.py file:


> 2024-01-17 09:44:02.781942: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
> start0
> 2024-01-17 09:44:03.506227: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2899995000 Hz
> 2024-01-17 09:44:03.506503: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x178e020 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
> 2024-01-17 09:44:03.506527: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
> 2024-01-17 09:44:03.507608: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
> 2024-01-17 09:44:03.507621: E tensorflow/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: UNKNOWN ERROR (303)
> 2024-01-17 09:44:03.507638: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
> Logging to /yzm/Client0/log
> start1
> 2024-01-17 09:44:04.010809: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2899995000 Hz
> 2024-01-17 09:44:04.011124: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x178e020 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
> 2024-01-17 09:44:04.011149: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
> 2024-01-17 09:44:04.012251: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
> 2024-01-17 09:44:04.012264: E tensorflow/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: UNKNOWN ERROR (303)
> 2024-01-17 09:44:04.012279: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
> Logging to /yzm/Client1/log
> /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.deserialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead.
>   self._target(*self._args, **self._kwargs)
> /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.serialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead.
>   self._target(*self._args, **self._kwargs)
> start2
> 2024-01-17 09:44:04.514308: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2899995000 Hz
> 2024-01-17 09:44:04.514629: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x178e020 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
> 2024-01-17 09:44:04.514654: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
> 2024-01-17 09:44:04.515743: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
> 2024-01-17 09:44:04.515756: E tensorflow/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: UNKNOWN ERROR (303)
> 2024-01-17 09:44:04.515771: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
> Logging to /yzm/Client2/log
> /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.deserialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead.
>   self._target(*self._args, **self._kwargs)
> /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.serialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead.
>   self._target(*self._args, **self._kwargs)
> start3
> 2024-01-17 09:44:05.017970: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2899995000 Hz
> 2024-01-17 09:44:05.018274: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x178e020 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
> 2024-01-17 09:44:05.018298: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
> 2024-01-17 09:44:05.019441: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
> 2024-01-17 09:44:05.019456: E tensorflow/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: UNKNOWN ERROR (303)
> 2024-01-17 09:44:05.019471: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
> Logging to /yzm/Client3/log
> /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.deserialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead.
>   self._target(*self._args, **self._kwargs)
> /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.serialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead.
>   self._target(*self._args, **self._kwargs)
> /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.deserialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead.
>   self._target(*self._args, **self._kwargs)
> /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.serialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead.
>   self._target(*self._args, **self._kwargs)
StevenUST commented 8 months ago

I also met this problem. However, I do not run the game on docker as I do not know how to create the image guandan_actor:v5. How do you create it?

submit-paper commented 8 months ago

It seems that the error is resulted from "Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory". Well, maybe I can upload the actor image to this program.

submit-paper commented 8 months ago

Well, the actor image is too large to upload to github. I may upload it to netdisk.

StevenUST commented 8 months ago

Well, the actor image is too large to upload to github. I may upload it to netdisk.

Which netdisk?