I modify your code and delete the D1 and D2, which is the 3th architecture in Figure 2 of the original paper. However, the rewards that nav_a3c without D1D2 outputs are very low and the reward curve is abnormal .
But nav_a3c without D1D2 can get proper reward curve in Figure 3 of "learning to navigate in complex environments",
How to solve such issue?
Best regards to you.
I modify your code and delete the D1 and D2, which is the 3th architecture in Figure 2 of the original paper. However, the rewards that nav_a3c without D1D2 outputs are very low and the reward curve is abnormal . But nav_a3c without D1D2 can get proper reward curve in Figure 3 of "learning to navigate in complex environments",
How to solve such issue? Best regards to you.