Open zdh2292390 opened 3 years ago
In policy_agent.py, the retraining code, why there is a BFS teacher-guided training after the agent failed? This is not the same as the algorithm decription. Does this mean BFS is the upper bound of the RL agent?
In policy_agent.py, the retraining code, why there is a BFS teacher-guided training after the agent failed? This is not the same as the algorithm decription. Does this mean BFS is the upper bound of the RL agent?