tonyzhaozh / aloha

MIT License
1.4k stars 242 forks source link

What if the policy network without CVAE? #14

Closed ManUtdMoon closed 11 months ago

ManUtdMoon commented 1 year ago

Hey Tony,

Thank you for your great work!

I have a question about the policy network selection. I noticed that in order to consider the stochasity of human demonstrations, you choose to leverage CVAE to model the action distribution, which outperforms the method w/o CVAE in the ablation study.

I wonder whether the policy without CVAE is identical to a distributional policy. What if the mean and covariance are both learned during training? To this end, will introducing policy entropy into a distributional policy w/o CVAE work?

Thank you for your time. I am looking forward to your reply :D

Regards, Dongjie

tonyzhaozh commented 11 months ago

Hi Dongjie,

Not sure if I completely understand your comment, but are you referring to a policy that outputs mean and covariance of the action? That could work theoretically, but its hard in the case of chunking to predict a sequence of actions and a full covariance matrix, as the latter one will be too high dimensional.

In addition, we recently noticed that the CVAE might not matter much. You can try training a policy without the CVAE encoder and latent code, and it might work similarly. We are still investigating more on this and potentially update the original paper.

Thanks, Tony

ManUtdMoon commented 11 months ago

Hi Tony,

Thank you for your reply and it answers my questions. Looking forward to your updates on more effecitive policy architectures.

Regards, Dongjie