Question Regarding Stable Video Generation from franka kitchen Data

notmahi / bet

Code and website for Behavior Transformers: Cloning k modes with one stone.

https://mahis.life/bet

MIT License

109 stars 16 forks source link

Question Regarding Stable Video Generation from franka kitchen Data #8

Closed l067 closed 8 months ago

l067 commented 8 months ago

Thanks for your awesome work. I've been attempting to replicate your code and have encountered a specific issue.

I noticed that the data provided for the franka kitchen as well as the franka kitchen environment itself, includes only qpos (position) data but lacks qvel (velocity) data. This absence seems to lead to a situation where the videos generated from images using env.render() exhibit some degree of shakiness or trembling.

Could you please share how you managed to produce stable, tremble-free images for your video outputs? Any guidance or workaround to address this issue would be greatly appreciated.

notmahi commented 8 months ago

Hi @l067, the shakiness that you see comes from the fact that the original paper https://arxiv.org/abs/1910.11956 included noisy qpos data (afaik they generated the data and then added noise to it). If you are trying to render that, you will see the shakiness. The videos that we posted are rendering from policies that were trained using this data, which doesn't have the extra added noise and thus doesn't seem shaky.

Does that answer your question?

l067 commented 8 months ago

Thank you for your response. I've attempted to generate videos using a trained policy, but the issue persists. I've noticed that after using env.step(), the returned state still contains only the first 30 dimensions of information, with the latter 30 being zeros. Could this be affecting the shakiness in the generated videos? Additionally, are you using env.render() to render the videos?

jeffacce commented 8 months ago

Hi @l067, thanks for your interest in our work!

the returned state still contains only the first 30 dimensions of information, with the latter 30 being zeros.

The last 30 state dimensions contain the goal in the original paper codebase. It is filled with zeros by default and we use it as is for unconditional rollout.

Could this be affecting the shakiness in the generated videos?

Hmm that sounds like a separate problem. If the video is rendered from the dataset, it should have some noise. If it is a rollout from a trained policy, it should be much less noisy.

Additionally, are you using env.render() to render the videos?

Yes.

Let us know if you have more questions; happy to help.

l067 commented 8 months ago

Thank you so much for your helpful response! My issue has been resolved!