Closed boyeon-kim closed 8 months ago
- I - nu
- I
reward design1 (check)
-I
reward design2 (차이에 따라 차등 penalty)
-I - abs(max(0, np.sum(self.nus) - self.nu_total_max))
sum(self.nus) > self.nu_total_max
- I - nu
?- I
?