skjw1224 / epelRL

0 stars 0 forks source link

알고리즘 점검 #69

Closed skjw1224 closed 2 weeks ago

skjw1224 commented 4 months ago

Advantage Actor-Critic (A2C): 종우 Deep Q-Network (DQN): 종우 Quantile Regression DQN (QR-DQN): 혜인 Deep Deterministic Policy Gradient (DDPG): 종우 Trust Region Policy Optimization (TRPO): 종우 Proximal Policy Optimization (PPO): 종우 Soft Actor-Critic (SAC): 종우 Globalized Dual Heuristic Programming (GDHP): 종우 Iterative Linear Quadratic Regulator (iLQR): 혜인 Stochastic Differential Dynamic Programming (SDDP): 혜인 Policy learning by Weighting Exploration with the Returns (PoWER): 종우, 혜인 Relative Entropy Policy Search (REPS): 종우, 혜인 Policy Improvement with Path Integral (PI2): 종우, 혜인

skjw1224 commented 4 months ago

Review Checklist

  1. 논문 원문이랑 수식 대조
  2. 다른 RL package에서 사용한 hyperparameter set
  3. Hyperparameter tuning및 성능평가는 성능평가지표 및 tester 함수를 만든후에 가능하므로 나중에..
skjw1224 commented 4 months ago

https://stable-baselines3.readthedocs.io/en/master/index.html https://docs.ray.io/en/latest/rllib/index.html