Hey,
I am confused about the dual implementation in the QREPS agent.
The code I am talking about is in qreps_algorithm in the QREPS class.
To be specific:
As far as I understand, the last dimension in weights_td is always added and then the logsumexp operation does nothing.
Maybe, you can help me in understanding this or maybe there are changes between the version of the paper and the implementation visible here.
The current implementation seems to perform only good with the fixed seed 0.
When setting any other seed the learning breaks down completely.
Hey, I am confused about the dual implementation in the QREPS agent. The code I am talking about is in qreps_algorithm in the QREPS class. To be specific:
As far as I understand, the last dimension in weights_td is always added and then the logsumexp operation does nothing. Maybe, you can help me in understanding this or maybe there are changes between the version of the paper and the implementation visible here.
The current implementation seems to perform only good with the fixed seed 0. When setting any other seed the learning breaks down completely.
I hope you can guide me in understanding this.