I am currently dealing with "agents/tf_agents/bandits/" . I am wondering where or if the classic Contextual Bandit off-policy evaluation procedures are present in Tensorflow.I mean exactly the following off-policy evaluation procedures:
Before I start thinking about how to integrate the methods from obp in the tensorflow environment, I would like to know if and where these methods can be found at TF Agents.
Hi,
I am currently dealing with "agents/tf_agents/bandits/" . I am wondering where or if the classic Contextual Bandit off-policy evaluation procedures are present in Tensorflow.I mean exactly the following off-policy evaluation procedures:
I mean the evaluation procedures that vowpal_wabbit already uses. Can be found here: https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/python_Contextual_bandits_and_Vowpal_Wabbit.html
Or even more desirable, methods which we can find at the package Open Bandit Pipeline: https://github.com/st-tech/zr-obp
Before I start thinking about how to integrate the methods from obp in the tensorflow environment, I would like to know if and where these methods can be found at TF Agents.