Eval statistics for SAC

amandlek commented 5 years ago

I had a question about the way evaluation statistics are computed for SAC - from taking a look at the code, it seems as though the statistics will only be computed over one particular training batch every epoch (https://github.com/vitchyr/rlkit/blob/master/rlkit/torch/sac/sac.py#L161), is this true? I'd imagine that this measurement would be pretty high variance, as opposed to averaging the statistics over all batches in the epoch. Could you clarify if this is the case and if so, why you've implemented logging in this way?

vitchyr commented 5 years ago

Yes, that's how it's implemented. It's mainly because I figured it'd be faster and would save memory. But on second thought, it's probably not too costly to keep track of all stats. Still, the current way has been good enough for me. Feel free to make a PR implementing this change.

amandlek commented 5 years ago

Thanks for the quick reply!

rail-berkeley / rlkit

Eval statistics for SAC #51