Closed skjw1224 closed 2 weeks ago
Advantage Actor-Critic (A2C): 종우 Deep Q-Network (DQN): 종우 Quantile Regression DQN (QR-DQN): 혜인 Deep Deterministic Policy Gradient (DDPG): 종우 Trust Region Policy Optimization (TRPO): 종우 Proximal Policy Optimization (PPO): 종우 Soft Actor-Critic (SAC): 종우 Globalized Dual Heuristic Programming (GDHP): 종우 Iterative Linear Quadratic Regulator (iLQR): 혜인 Stochastic Differential Dynamic Programming (SDDP): 혜인 Policy learning by Weighting Exploration with the Returns (PoWER): 종우, 혜인 Relative Entropy Policy Search (REPS): 종우, 혜인 Policy Improvement with Path Integral (PI2): 종우, 혜인
Review Checklist
https://stable-baselines3.readthedocs.io/en/master/index.html https://docs.ray.io/en/latest/rllib/index.html
Advantage Actor-Critic (A2C): 종우 Deep Q-Network (DQN): 종우 Quantile Regression DQN (QR-DQN): 혜인 Deep Deterministic Policy Gradient (DDPG): 종우 Trust Region Policy Optimization (TRPO): 종우 Proximal Policy Optimization (PPO): 종우 Soft Actor-Critic (SAC): 종우 Globalized Dual Heuristic Programming (GDHP): 종우 Iterative Linear Quadratic Regulator (iLQR): 혜인 Stochastic Differential Dynamic Programming (SDDP): 혜인 Policy learning by Weighting Exploration with the Returns (PoWER): 종우, 혜인 Relative Entropy Policy Search (REPS): 종우, 혜인 Policy Improvement with Path Integral (PI2): 종우, 혜인