Closed xiang578 closed 2 months ago
https://xiang578.com/post/reinforce-learnning-basic-sparse-reward.html
我的笔记汇总: Policy Gradient、PPO: Proximal Policy Optimization、Q-Learning Actor Critic Sparse Reward Imitation Learning Reward Shaping 如果 reward 分布非常稀疏的时候,actor 会很难学习,所以刻意设计 reward 引导模型学习。 Curiosity Intr
https://xiang578.com/post/reinforce-learnning-basic-sparse-reward.html
我的笔记汇总: Policy Gradient、PPO: Proximal Policy Optimization、Q-Learning Actor Critic Sparse Reward Imitation Learning Reward Shaping 如果 reward 分布非常稀疏的时候,actor 会很难学习,所以刻意设计 reward 引导模型学习。 Curiosity Intr