nv-tlabs / ASE

Other
795 stars 130 forks source link

The implementation seems to be different from the method in the paper #70

Open zmccmzty opened 10 months ago

zmccmzty commented 10 months ago

It looks like the policy and the discriminator are trained together at the same rate with single optimizer and combined loss (https://github.com/nv-tlabs/ASE/blob/21257078f0c6bf75ee4f02626260d7cf2c48fee0/ase/learning/ase_agent.py#L280C1-L280C1). It seems to be different from the pseudocode in the paper, where they were trained separately. Any idea about what's the reason for this? Or am I missing something?

Winston-Gu commented 10 months ago

I have the same question here ...