Multimodal data as input to the model

nicklashansen / tdmpc

Code for "Temporal Difference Learning for Model Predictive Control"

MIT License

346 stars 55 forks source link

Hi, congratulations on the amazing work!

I wanted to ask a question, the paper mentions that multimodal data [RGB + proprioception] can be used as input of the model

In the code, the observations are sent to an encoder that process them in different ways depending if it's pixels or another modality, nevertheless I'm not sure that any of those options apply to multimodal data containing both pixels and state information. Given the experiments you made in the paper, how would you recommend processing such data in the encoder?

nicklashansen / tdmpc

Multimodal data as input to the model #10