Text conditioning with FiLM layers?

lucidrains / robotic-transformer-pytorch

Implementation of RT1 (Robotic Transformer) in Pytorch

MIT License

373 stars 31 forks source link

Closed Olimoyo closed 1 year ago

Olimoyo commented 1 year ago

Hi,

I noticed that the text conditioning is done using classifier-free guidance while in the original paper, they use FiLM layers.

Was there a particular reason for this decision?

Thank you, Oliver

lucidrains commented 1 year ago

yea, because i know cross attention on fine-grained tokens is better. this part i'm quite confident about

Olimoyo commented 1 year ago

I see, thanks!