Open andrearosasco opened 5 months ago
I'm also curious about why multi-head attention is enabled in the L1Head when it seems to be False for the DiffusionHead. Is this also based on the design decisions in the aloha/act paper?
Ah sorry I had missed this question when you posted it a while back!
window_size = 1
), thus adding proprio is no problem.Re multi-head attention: I am assuming your question is regarding the use_map
argument? Since we're using a single read-out token for the action head in both cases, the attention pooling shouldn't have much effect, so I'd expect this argument to not matter in practice.
Do you have any insights about the differences in configuration between the pre-trained model and the aloha fine-tuned one?
In particular I was wondering