When the battery is already 100%, the capacity at the next time step will be less than the capacity at which it reached 100% because of degradation at previous time step. So, the normalized soc > 1.0. When calculating the max input/output power from the power curve at normalized soc > 1.0, it outputs the value as if soc << 1.0 (maximum output). This will be the case until the soc loss as a result of the loss coefficient brings the soc below the degraded capacity which will happen in a few of time steps.
Most obvious with a random action agent but an intelligent agent could learn the behavior overtime and just avoid sending large discharge actions when soc == 1.0
To fix the bug, make sure this line always evaluates 0.0 <= normalized_soc <= 1.0
When the battery is already 100%, the capacity at the next time step will be less than the capacity at which it reached 100% because of degradation at previous time step. So, the normalized soc > 1.0. When calculating the max input/output power from the power curve at normalized soc > 1.0, it outputs the value as if soc << 1.0 (maximum output). This will be the case until the soc loss as a result of the loss coefficient brings the soc below the degraded capacity which will happen in a few of time steps.
Most obvious with a random action agent but an intelligent agent could learn the behavior overtime and just avoid sending large discharge actions when soc == 1.0
To fix the bug, make sure this line always evaluates
0.0 <= normalized_soc <= 1.0