Open thomashopkins32 opened 2 months ago
It might make the most sense to continuously update the advantage on the fly rather than try to compute all of it right before getting a sample.
Here is a screenshot of how the advantage is computed for our method:
Proposal:
nn.Module
s should be a tool used by the agent, not the basis for the agentPros:
Cons:
I think this refactoring will benefit the project in the long run. It will also make it easier to implement ICM (#14 ) because we can now have access to the intermediate results of the visual processing module which is needed to extract visual features.
Basically, the idea is to have every module of the agent able to access every other module. This was almost the case already, however, the "memory" was separate.