nickgkan / 3d_diffuser_actor

Code for the paper "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations"
https://3d-diffuser-actor.github.io/
MIT License
159 stars 16 forks source link

Questions about training details #9

Closed YiXiangChen1 closed 3 months ago

YiXiangChen1 commented 3 months ago

Hi, Thanks for your great work! I'm curious about the specific training details for RLBench and CALVIN. Could you kindly share the number of GPUs used for these tasks, as well as the duration of training for each? Your insights would be invaluable to me, thanks a lot!

twke18 commented 3 months ago

Hi, thanks for your interest! Our graphic cards are 40GB A100s.

For CALVIN, we used 6 gpus for our experiments. It took us 22 hour with a training iteration of 65,000.

For the setup of PerAct on RLBench, we used 6/7 gpus. It took us 6.5 days with a training iteration of 600,000.

For the setup of GNFactor on RLBench, we used 4 gpus. It tooks us 2.7 days with a training iteration of 600,000.

Hope this information helps.

YiXiangChen1 commented 3 months ago

Thanks for your quick reply! It's genuinely helpful!

fengxiuyaun commented 3 months ago

For CALVIN, we used 6 gpus for our experiments. It took us 22 hour with a training iteration of 65,000.

Hi, Thanks for your great work!

  1. You mention here that it is trained 65,000 epcoh for CALVIN, but I see in the train_trajectory_calvin.sh file that it is written as 600,000, right? (https://github.com/nickgkan/3d_diffuser_actor/blob/e576b69daec958c9706865b662b79bb068170220/scripts/train_trajectory_calvin.sh#L41)
  2. Generally, using 8 GPUs, but why are there 6 here? Is the effect of 8 cards not good? Thanks!
twke18 commented 3 months ago

Hi,

  1. We found that models can easily overfit to the training data on CALVIN. Thus, we tested the model trained with 65,000 iterations, not 600,000 iterations. In other words, you can stop the training earlier.
  2. We found that using 6/7 gpus is enough for our experiments. You can of course increase the batch size and use more gpus.
fengxiuyaun commented 3 months ago

Thank you for the quick response. Could you add some ablation experiments? For example, why len(gripper_history) == 3; In the model design, why is there such an attention mechanism between gripper, action, context, and intrinsics? In the trajectory, why are there 20 interpolated points between two key poses? It seems that some variables or the design of the model structure are not very clear.

twke18 commented 3 months ago

Hi,

We found that on RLBench, including gripper history helps to resolve ambiguity in predicting target gripper pose. Take stack_wine task for example, the following shows the proprioception and target gripper position at certain key frames:

Key     Target gripper pose     Proprio gripper pose
T=0 [ 0.2668, -0.4018,  0.9723] [ 0.2785, -0.0082,  1.4719]
T=1 [ 0.3159, -0.4108,  0.9724] [ 0.2668, -0.4018,  0.9723]
T=2 [ 0.3160, -0.4108,  0.9961] [ 0.3159, -0.4108,  0.9724]
T=3 [ 0.3705, -0.1827,  0.8914] [ 0.3160, -0.4108,  0.9961]
T=4 [ 0.4064,  0.0127,  0.8920] [ 0.3705, -0.1827,  0.8914]

As you can see, for the second and third key frame, the target gripper poses are different despite that both key frames have the same proprioception.

Due to limited computational resources, we were/are not able to ablate every aspect and thoroughly search the hyper-parameters. Most of the architectural design and hyper-parameters were guessed by us. On the other hand, we believe the code base is well written and clearly documented, we would encourage you to try some different settings. You might get better performance!

fengxiuyaun commented 3 months ago

Thks. I tested the weights you released (https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/diffuser_actor_calvin.pth), and there are slight differences from the results in the paper. Is this normal?

  checkpoint number task1 task2 task3 task4 task5 avg seq len
  paper 1000 92.2 78.7 63.9 51.2 41.2 3.27
  test1 1000 90.6% 76.5% 61.8% 49.1% 39.3% 3.173
  test2 1000 91.3% 76.9% 61.3% 49.1% 38.1% 3.167
twke18 commented 3 months ago

Hi, we have also observed the performance difference. Our conjecture is that IK has some noise, resulting in varying performance. We reported the run with the highest performance in the paper.