AWAC's weight computation

im-Kitsch commented 2 years ago

Hello,

I just found one inconsistent detail of AWAC implentation of your code and official implementation.

For the update of actor, it will be updated as log_pi * weights, weights will be computed as exp(A/beta), in the implementation your code you made an softmax to compute the weights, looks there is still a weights * len(batch_sample). Just take a look at offcial version, co-daption paper implementation and your code at here,

https://github.com/frt03/inference-based-rl/blob/8c93996a172f266ed402d8c0a82ecb9b4229bce0/pfrlx/algos/awac.py#L207 https://github.com/rail-berkeley/rlkit/blob/c81509d982b4d52a6239e7bfe7d2540e3d3cd986/rlkit/torch/sac/awac_trainer.py#L707 https://github.com/takuseno/d3rlpy/blob/8eb11db2d6f406cfab6d08adc4e0c08666dd063e/d3rlpy/algos/torch/awac_impl.py#L159

Just take a short look of the three line marked in these three file.

Thanks for your work. Best,

takuseno commented 2 years ago

@im-Kitsch Hi, thanks for reporting this! It seems I missed that part. I'll fix it.

takuseno commented 2 years ago

Fixed in the latest commit. https://github.com/takuseno/d3rlpy/commit/c0fd568c266dbc7c0b3d5e233870eb3cb8b50ac2

takuseno / d3rlpy

AWAC's weight computation #163