Hi, I'm new to diffusion models in RL and am trying to figure out how the diffuser works particularly with conditioning and was hoping I can get some guidance. I read the paper and it proposes the idea of
Reinforcement Learning as Guided Sampling
Goal-Conditioned RL as Inpainting
I think I see the Goal-Conditioned RL being implemented with setting the Goal (starting states) in the sampling process, but I can't seem to find how the Guided sampling part is implemented
I was looking through the code for sampling and see that a cond is passed down to the temporal u net, but see that its not used.
Is it implemented elsewhere? or is this just a template of how I could inject my own rewards model into the temporal U net? Thanks in advance
Hi, I'm new to diffusion models in RL and am trying to figure out how the diffuser works particularly with conditioning and was hoping I can get some guidance. I read the paper and it proposes the idea of
I think I see the Goal-Conditioned RL being implemented with setting the Goal (starting states) in the sampling process, but I can't seem to find how the Guided sampling part is implemented I was looking through the code for sampling and see that a cond is passed down to the temporal u net, but see that its not used.
Is it implemented elsewhere? or is this just a template of how I could inject my own rewards model into the temporal U net? Thanks in advance