Strategies Overview - Githubissues

luwo9 / bomberman_rl

Reinformcement learning for Bomberman: Machine Learning Essentials lecture 2024 final project

0 stars 0 forks source link

An overview over all techniques/strategies mentioned:

Policy exploration/exploitation
- $\epsilon$-greedy
- Softmax
Update Q function
- SARSA
- (k-step) temporal difference
Update Q function based on batch
- All games (only few from last game)
- Look where Q differs from return
Sparse Q
- Exploit symmetries/ data augmentation
  - Equivalent state $\Rightarrow$ equivalent action
- Feature engineering
- Regression
  - Vector/Scalar output
  - OLS, RF, NN
  - SARSA etc.
Reward shaping during training
- $\tilde r{t+1} = r{t+1} + \psi(s{t+1})-\psi(s{t})$
- First and last have no shaping.
- Don't reward actions

luwo9 / bomberman_rl