mal-lang / mal-simulator

Apache License 2.0
1 stars 1 forks source link

Observations are -1 for defender after reset. #6

Closed kasanari closed 4 months ago

andrewbwm commented 4 months ago

Same solution as for #11

Fixed in 094d4a1794d174e56aef3cf98067eedf891dc48b.

The reset method now goes through the same function to generate observations, rewards, terminations, truncantions, and infos. Even though it is only expected to make use of observations and infos. However, if a non-viable episode is given it will immediately terminate it, but that might be the desired behaviour anyway.