How to jointly optimize action detector and trajectory diffuser?

zhouxian / act3d-chained-diffuser

A unified architecture for multimodal multi-task robotic policy learning.

107 stars 9 forks source link

How to jointly optimize action detector and trajectory diffuser? #11

Closed ManUtdMoon closed 8 months ago

ManUtdMoon commented 8 months ago

Dear authors,

Thank you for your inspiring work!

I noticed that in the ChainedDiffuser paper your mentioned that "...train both the action detector and the trajectory diffuser jointly" and "we train the first 2 terms till convergence, and then add the 3rd term for joint optimization". However, I did not see that there are codes for joint optimization because the only model in main_trajectory.py is a DiffusionPlanner.

Would you please explain more about the actual joint training of Act3d and DiffusionPlanner?

Regards, Dongjie

zhouxian commented 8 months ago

Hi, in our latest exps we found that simply training the models separately would yield similar/even better performance. Joint training is not needed any more.

ManUtdMoon commented 8 months ago

Hi, in our latest exps we found that simply training the models separately would yield similar/even better performance. Joint training is not needed any more.

Hi Zhou, thank you for your quick reply!

I think seperate training needs relabeling of the keypose. Could you please show me how it is done in the code？ Thank you！

zhouxian commented 8 months ago

What do you mean relabeling? We use the same strategy for both act3d and chained diffuser for extracting keyframes.

On Tue, Dec 26, 2023 at 4:46 PM Dongjie Yu @.***> wrote:

Hi, in our latest exps we found that simply training the models separately would yield similar/even better performance. Joint training is not needed any more.

Hi Zhou, thank you for your quick reply!

I think seperate training needs relabeling of the keypose. Could you please show me how it is done in the code？ Thank you！

— Reply to this email directly, view it on GitHub https://github.com/zhouxian/act3d-chained-diffuser/issues/11#issuecomment-1869846484, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEV4V6OHGFX73HADGXKNZHTYLNVWZAVCNFSM6AAAAABBDMKP3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRZHA2DMNBYGQ . You are receiving this because you commented.Message ID: @.***>

ManUtdMoon commented 8 months ago

The paper mentioned that goal gripper pose is not ground truth but predicted by the action detector. Therefore, during training, I think the target keypose should be relabeled by act3d instead of taking from the ground truth. Is there something wrong with my understanding？

zhouxian commented 8 months ago

You can simply train them separately and use gt pose during training. At inference time act3d can be used to key pose prediction.

On Tue, Dec 26, 2023 at 5:17 PM Dongjie Yu @.***> wrote:

The paper mentioned that goal gripper pose is not ground truth but predicted by the action detector. Therefore, during training, I think the target keypose should be relabeled by act3d instead of taking from the ground truth. Is there something wrong with my understanding？

— Reply to this email directly, view it on GitHub https://github.com/zhouxian/act3d-chained-diffuser/issues/11#issuecomment-1869857650, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEV4V6OBMFLKE7FVFYSXMYDYLNZKTAVCNFSM6AAAAABBDMKP3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRZHA2TONRVGA . You are receiving this because you commented.Message ID: @.***>

ManUtdMoon commented 8 months ago

Thank you for your answers！ Have a nice day！