Open luwo9 opened 3 months ago
we discussed: rewarding events should be fine as those can be seen as a difference between previous and current state (e.g. bomb placed is seen as the difference betweeen a bomb being present at coordinates x,y in state t and not being present at coordinates x,y at state t-1)
An overview over all techniques/strategies mentioned: