martyn-smith / Eastmann-Adversarial

Implementations of the Tennessee Eastmann process suitable for Adversarial Reinforcement Learning
0 stars 0 forks source link

add alternating blue/red learning phases #9

Open martyn-smith opened 2 years ago

martyn-smith commented 2 years ago

We might see significantly higher learning stability if we force blue to have a learning phase against a static red policy, and vice versa. Easy enough to implement, possibly a new branch.