Open martyn-smith opened 2 years ago
We might see significantly higher learning stability if we force blue to have a learning phase against a static red policy, and vice versa. Easy enough to implement, possibly a new branch.
We might see significantly higher learning stability if we force blue to have a learning phase against a static red policy, and vice versa. Easy enough to implement, possibly a new branch.