Closed xiang578 closed 2 months ago
https://xiang578.com/post/reinforce-learnning-basic-imitation-learning.html
我的笔记汇总: Policy Gradient、PPO: Proximal Policy Optimization、Q-Learning Actor Critic Sparse Reward Imitation Learning apprenticeship learning 无法从环境中获得 reward。 某些任务中很难定义 reward。 人为设计的奖励可能导致意外的行为。 学习专家
https://xiang578.com/post/reinforce-learnning-basic-imitation-learning.html
我的笔记汇总: Policy Gradient、PPO: Proximal Policy Optimization、Q-Learning Actor Critic Sparse Reward Imitation Learning apprenticeship learning 无法从环境中获得 reward。 某些任务中很难定义 reward。 人为设计的奖励可能导致意外的行为。 学习专家