jannerm / diffuser

Code for the paper "Planning with Diffusion for Flexible Behavior Synthesis"
https://diffusion-planning.github.io
MIT License
813 stars 125 forks source link

How is conditioning implemented #55

Closed KenjiPcx closed 6 months ago

KenjiPcx commented 6 months ago

Hi, I'm new to diffusion models in RL and am trying to figure out how the diffuser works particularly with conditioning and was hoping I can get some guidance. I read the paper and it proposes the idea of

  1. Reinforcement Learning as Guided Sampling
  2. Goal-Conditioned RL as Inpainting

I think I see the Goal-Conditioned RL being implemented with setting the Goal (starting states) in the sampling process, but I can't seem to find how the Guided sampling part is implemented I was looking through the code for sampling and see that a cond is passed down to the temporal u net, but see that its not used.

Is it implemented elsewhere? or is this just a template of how I could inject my own rewards model into the temporal U net? Thanks in advance

KenjiPcx commented 6 months ago

My bad, just realized that there are different branches implementing different code. I'm going to take a look at the kuka guidance implementation