Contextual Bandit Off-Policy Evaluation

vitorkrasniqi commented 1 year ago

Hi,

I am currently dealing with "agents/tf_agents/bandits/" . I am wondering where or if the classic Contextual Bandit off-policy evaluation procedures are present in Tensorflow.I mean exactly the following off-policy evaluation procedures:

Direct Method
Inverse Probability Weighting (IPW)
Doubly Robust (DR) / also known as Augmented IPW

I mean the evaluation procedures that vowpal_wabbit already uses. Can be found here: https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/python_Contextual_bandits_and_Vowpal_Wabbit.html

Or even more desirable, methods which we can find at the package Open Bandit Pipeline: https://github.com/st-tech/zr-obp

Before I start thinking about how to integrate the methods from obp in the tensorflow environment, I would like to know if and where these methods can be found at TF Agents.

vitorkrasniqi commented 1 year ago

It is currently not available.

SamanthaSHan commented 1 year ago

Did you end up implementing yourself? Curious if you found any solutions to this

tensorflow / agents

Contextual Bandit Off-Policy Evaluation #791