submit-paper / Danzero_plus

31 stars 6 forks source link

Receiving 0 FPS in learner.py and the code problem in actor.py #11

Open StevenUST opened 7 months ago

StevenUST commented 7 months ago

I also meet 0 FPS when I execute learner.py. Here is the output after executing learner.py

2024-01-21 12:31:43.273682: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. WARNING:tensorflow:From learner.py:23: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From learner.py:23: The name tf.logging.ERROR is deprecated. Please use tf.compat.v1.logging.ERROR instead.

2024-01-21 12:31:43.914627: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2419195000 Hz 2024-01-21 12:31:43.916878: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5086140 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2024-01-21 12:31:43.916907: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2024-01-21 12:31:43.919762: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2024-01-21 12:31:44.329877: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1068] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-01-21 12:31:44.330010: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x50ec7f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2024-01-21 12:31:44.330033: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 4060 Laptop GPU, Compute Capability 8.9 2024-01-21 12:31:44.330774: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1068] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-01-21 12:31:44.330801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties: name: NVIDIA GeForce RTX 4060 Laptop GPU major: 8 minor: 9 memoryClockRate(GHz): 2.25 pciBusID: 0000:01:00.0 2024-01-21 12:31:44.330832: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2024-01-21 12:31:44.337407: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2024-01-21 12:31:44.357283: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2024-01-21 12:31:44.357703: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2024-01-21 12:31:44.358499: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 2024-01-21 12:31:44.359689: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2024-01-21 12:31:44.359824: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2024-01-21 12:31:44.360369: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1068] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-01-21 12:31:44.360835: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1068] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-01-21 12:31:44.360892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0 2024-01-21 12:31:44.360946: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2024-01-21 12:31:45.009591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix: 2024-01-21 12:31:45.009647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0 2024-01-21 12:31:45.009669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N 2024-01-21 12:31:45.010572: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1068] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-01-21 12:31:45.010604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-01-21 12:31:45.011234: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1068] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-01-21 12:31:45.011351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5190 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 4060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9) model build success Logging to LEARNER-2024-01-21-12-31-45/log

Besides, I read the code in actor.py carefully and I found if the actor sends data to learner, the type of state must be str. However, when I record the type of state, it is either dict or int.

This is the program segment where the actor sends data to learner based on the type of state.

while True:

做动作到获得reward

    state = deserialize(socket.recv())
    if not isinstance(state, int) and not isinstance(state, float) and not isinstance(state, str):
        action_index = player.sample(state)
        socket.send(serialize(action_index).to_buffer())
    elif isinstance(state, str):
        socket.send(b'none')
        if state[0] == 'y':
            player.save_data(int(state[1]))
        else:
            player.save_data(-int(state[1]))
        player.send_data(state)
        player.update_weight()
    else:
        socket.send(b'none')
        player.save_data(state)
Cyclones-Y commented 7 months ago

Maybe you can print the information from game.py. My wechat is 15677245625. I am studying this work alone. If you can, please add my wechat to discuss it.

freebooterish commented 4 months ago

I did the test using a single Ubuntu 22.04 machine without using docker. I ran 2 steps separately and it looked normal. step 1: Danzero_plus/actortorch/restart.sh Remember add --ip 127.0.0.1 after actor.py for example: $ more restart.sh #!/bin/bash nohup /home/david/RL/Danzero_plus/actor_torch/danserver 100000 >/dev/null 2>&1 & sleep 1s nohup python -u /home/david/RL/Danzero_plus/actor_torch/actor.py --ip 127.0.0.1 > /home/david/actor_out.log 2>&1 & sleep 1s nohup python -u /home/david/RL/Danzero_plus/actor_torch/game.py > /home/david/gameout.log 2>&1 & step 2: python learner_torch/learner.py Remember to add execution permissions to danserver,restart.sh etc.

freebooterish commented 1 month ago

您的Linux版本是多少? 请修改 guandan_offline_v1006执行权限,如果报错请打印出报错信息?

wangdabee @.***> 于2024年8月8日周四 14:43写道:

@StevenUST https://github.com/StevenUST I read the code in actor.py carefully and I found if the actor sends data to learner, the type of state must be str. However, when I record the type of state, it is either dict or int. I also met this problem。I can‘t run danserver,I use guandan_offline_v10062/guandan_offline_v1006/ubuntu/guandan_offline_v1006

— Reply to this email directly, view it on GitHub https://github.com/submit-paper/Danzero_plus/issues/11#issuecomment-2275067157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJRIMCHG2GHDULWDUH4WW3ZQMHQTAVCNFSM6AAAAABCD6V422VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZVGA3DOMJVG4 . You are receiving this because you were mentioned.Message ID: @.***>