If I understood it correctly Jakob suggested that we add the C, I, A score as three values in the observation for each attack step.
Mathias wanted to have the reward itself(or the value from which it is derived the C, I, or A score) be added to each attack step in the observation.
If I understood it correctly Jakob suggested that we add the C, I, A score as three values in the observation for each attack step. Mathias wanted to have the reward itself(or the value from which it is derived the C, I, or A score) be added to each attack step in the observation.