Using `tf-agents` for Bandits with sparse data

Hi, I am looking to use tf-agents to develop a multi armed bandit for advertising.

For each observation, I don't have the reward for other arms, because I'll only show that single arm to the observation.

Is tf-agents able to handle such situations? I went through all the Environments and all of them seem to assume that rewards are available for each observation-arm combination. The MovieLens example is handling sparsity using SVD.

Will I need to use similar methods to estimate the reward for other arms? or is there something in tf-agents that I am missing out on?

tensorflow / agents

Using `tf-agents` for Bandits with sparse data #778