ray.get([e.reset.remote() for e in envs]) does not work

real-stanford / flingbot

[CoRL 2021 Best System Paper] This repository contains code for training and evaluating FlingBot in both simulation and real-world settings on a dual-UR5 robot arm setup for Ubuntu 18.04

https://flingbot.cs.columbia.edu/

106 stars 25 forks source link

ray.get([e.reset.remote() for e in envs]) does not work #5

Closed guningquan closed 2 years ago

guningquan commented 2 years ago

I set the environment according to the instruction without any error. But, when I decide to train the model. the ray.get does not work anymore.

input: python run_sim.py --tasks flingbot-rect-train.hdf5 --num_processes 2 --log flingbot-train-from-scratch --action_primitives fling

the code is following. (In order to debug, I printed some information)

    #run_sim.py
    print(envs)
    observations = ray.get([e.reset.remote() for e in envs])

the output of the terminal:

2022-07-09 01:18:03,451 INFO services.py:1476 -- View the Ray dashboard at http://127.0.0.1:8265
SEEDING WITH 0
[Policy] Action primitives:
        fling
Replay Buffer path: flingbot-train-from-scratch/replay_buffer.hdf5
[Actor(SimEnv, d24c6cc9f6d506fd4331284101000000), Actor(SimEnv, 4b8d98d4a8025b3e5e0e3ccf01000000)]

The code did not run further, The code is stuck in an endless wait without outputting any more information.

guningquan commented 2 years ago

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b. The ray.get([e.reset.remote() for e in envs]) will not work

zcswdt commented 9 months ago

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b. The ray.get([e.reset.remote() for e in envs]) will not work

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b. The ray.get([e.reset.remote() for e in envs]) will not work

Hello, have you successfully trained? Approximately how many times of training is required to meet the requirements in the author's paper? Thanks

guningquan commented 9 months ago

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b. The ray.get([e.reset.remote() for e in envs]) will not work

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b. The ray.get([e.reset.remote() for e in envs]) will not work

Hello, have you successfully trained? Approximately how many times of training is required to meet the requirements in the author's paper? Thanks

@zcswdt Yes, I have run it successfully. The author mentioned that it needs 15000 training points to achieve the performance.

zcswdt commented 9 months ago

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b. The ray.get([e.reset.remote() for e in envs]) will not work

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b. The ray.get([e.reset.remote() for e in envs]) will not work

Hello, have you successfully trained? Approximately how many times of training is required to meet the requirements in the author's paper? Thanks

@zcswdt Yes, I have run it successfully. The author mentioned that it needs 15000 training points to achieve the performance.

Thank you very much for your reply. Does the author refer to i? I use the fling action primitive, but the training effect after training is not as good as the model training effect provided by the author. Can you add a QQ to communicate? I can pay you a small fee, thank you very much. My qq: 810190882. Thank you for adding it.

guningquan commented 9 months ago

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b. The ray.get([e.reset.remote() for e in envs]) will not work

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b. The ray.get([e.reset.remote() for e in envs]) will not work

Hello, have you successfully trained? Approximately how many times of training is required to meet the requirements in the author's paper? Thanks

@zcswdt Yes, I have run it successfully. The author mentioned that it needs 15000 training points to achieve the performance.

Thank you very much for your reply. Does the author refer to i? I use the fling action primitive, but the training effect after training is not as good as the model training effect provided by the author. Can you add a QQ to communicate? I can pay you a small fee, thank you very much. My qq: 810190882. Thank you for adding it.

The author noted the training episode in his paper. Have you trained your model for enough episodes? You don't need to worry about paying anything. We can talk about it more. Please feel free to post your summarized question here.

zcswdt commented 9 months ago

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b. The ray.get([e.reset.remote() for e in envs]) will not work

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b. The ray.get([e.reset.remote() for e in envs]) will not work

Hello, have you successfully trained? Approximately how many times of training is required to meet the requirements in the author's paper? Thanks

@zcswdt Yes, I have run it successfully. The author mentioned that it needs 15000 training points to achieve the performance.

zcswdt commented 8 months ago

I konw it ! The version of blender must be v3.2.1 ! If your version of blender is 2.79.b.

I successfully ran a training process, but I noticed that as the number of training iterations increased, my memory was gradually consumed until all of it was used up, causing the program to crash. I have 64GB of memory. I'm not sure what's causing this. My driver version is CUDA 11.4, but when I check with 'nvcc -V', it shows CUDA 10.0

zcswdt commented 7 months ago

我知道！Blender的版本必须是v3.2.1！如果您的 Blender 版本是 2.79.b。ray.get([e.reset.remote() for e in envs]) 将不起作用

我知道！Blender的版本必须是v3.2.1！如果您的 Blender 版本是 2.79.b。ray.get([e.reset.remote() for e in envs]) 将不起作用

你好，请问你训练成功了吗？大约需要训练多少次才能达到作者论文中的要求？谢谢

@zcswdt是的，我已经运行成功了。作者提到需要15000个训练点才能达到这个性能。

非常感谢您的回复。作者提到的是i吗？我使用fling action原语，但是训练后的训练效果不如作者提供的模型训练效果。可以加个QQ交流吗？我可以付给你一小笔费用，非常感谢。我的QQ：810190882。感谢您的添加。

作者在论文中提到了训练过程。您是否已经对模型进行了足够的训练？您无需担心支付任何费用。我们可以多谈谈。请随时在此处发布您总结的问题。

大佬我找您了，你之前回复过我，可以请教您这个环境的问题么？谢谢了，我内存溢出一直没解决。