Seeking advice on how to choose between ACT and DP algorithms

le-wei commented 3 months ago

Hello,

Thank you very much for the work you have done in bringing together the current excellent imitation learning collections for convenient use. Regarding the ACT algorithm and DP algorithm, besides the basic differences in the algorithms themselves, how should one choose between them for different tasks? Do they have specific types of tasks they are particularly suited for? I have just started using your project and am unsure how to select the appropriate algorithm. I would greatly appreciate any advice you can provide.

Thank you!

alexander-soare commented 3 months ago

Hi @le-wei, thanks for the props! Here are some of my answers (but I think that for some of these you'll get different answers depending on who you ask)

DP is going to be more compute heavy as it typically needs multiple forward passes through the model. ACT is much faster, even when outputting longer horizons. So ACT might be the way to go for faster control loops.
DP has some nice flexibility in conditioning via inpainting. We don't have this feature yet, but we can use inpainting to warm-start or constrain parts of temporally adjacent action chunks.
Word is on the street that DP might be better at handling multimodality? I'd take this with a grain of salt and try both ways.
Our DP has a 1D CNN backbone which might make it harder to deal with sudden changes in dynamics, for example when making contact. The paper covers this, and proposes a transformer backbone to mitigate this issue. but we don't have it implement yet. If you have sharp transitions in dynamics around critical points of your trajectory (eg hanging keys on a hook) you might want ACT for its transformer backbone.

le-wei commented 3 months ago

@alexander-soare Thank you very much for your response, it has been incredibly helpful. I can prioritize trying one approach in practice. Thank you again for your guidance.In practice, we also found that ACT is indeed less flexible than DP. However, ACT learns certain details better than DP, such as control over angles and distances. Do you have any good suggestions or methods to improve the flexibility of ACT? Our attempts to enhance images or add small amounts of noise to the trajectories have not yielded satisfactory results.

alexander-soare commented 3 months ago

@le-wei thanks for the tidbits!

When you say "improve the flexibility of ACT" can I check what you mean exactly? I think by "enhance images and add small amounts of noise" it seems you are referring to test-time generalization? We do have a new data augmentation feature which we have experimentally been shown to improve results https://github.com/huggingface/lerobot/pull/234

le-wei commented 3 months ago

"Thank you very much for your advice, I have learned a lot from your suggestions." @alexander-soare

le-wei commented 3 months ago

"Have you considered standardizing the data representation in ACT and DP? This way, the generated dataset can adapt to both strategies without requiring significant code modifications." ACT: DP:

alexander-soare commented 3 months ago

@le-wei it's not well documented yet, but this is part of the "standard" format. We allow "observation.image" if there is one image. If there is more than one image we require "observation.images.x", "observation.images.y" etc. It's an experiment and we will be more careful about documenting it once we think it's the way to go. Does that help answer your question?

le-wei commented 3 months ago

Thank you very much for your patient explanation. @alexander-soare

alexander-soare commented 3 months ago

Great, well, I'll close this issue and you're more than welcome to reopen if it's not resolved or open another one.

huggingface / lerobot

Seeking advice on how to choose between ACT and DP algorithms #263