Hi,
I am looking to use tf-agents to develop a multi armed bandit for advertising.
For each observation, I don't have the reward for other arms, because I'll only show that single arm to the observation.
Is tf-agents able to handle such situations? I went through all the Environments and all of them seem to assume that rewards are available for each observation-arm combination. The MovieLens example is handling sparsity using SVD.
Will I need to use similar methods to estimate the reward for other arms? or is there something in tf-agents that I am missing out on?
Hi, I am looking to use
tf-agents
to develop a multi armed bandit for advertising.For each observation, I don't have the reward for other arms, because I'll only show that single arm to the observation.
Is
tf-agents
able to handle such situations? I went through all the Environments and all of them seem to assume that rewards are available for each observation-arm combination. The MovieLens example is handling sparsity using SVD.Will I need to use similar methods to estimate the reward for other arms? or is there something in
tf-agents
that I am missing out on?