martyn-smith / Eastmann-Adversarial

Implementations of the Tennessee Eastmann process suitable for Adversarial Reinforcement Learning
0 stars 0 forks source link

integrate Digital Twin / runahead comparator as value function #13

Closed martyn-smith closed 2 years ago

martyn-smith commented 2 years ago

As above.

Typically, reinforcement learning must learn a "value function" - the value of being in state s at time t, accounting for all future rewards. This function is empirical.

We, however, have a twin that can - in absence of policy - perfectly predict future rewards. We've been looking for a place to integrate such a thing, and the value function is it.

Question: what state vector should the DT be fed?

martyn-smith commented 2 years ago

In branch "TwinBased", currently in bugfix/testing.

martyn-smith commented 2 years ago

Now of comparable maturity to other branches, closed.