Open ukaratay opened 7 years ago
This line causes the problem. I currently solved it by clipping the probability.
var prob = ((GenericTopSlotExplorerState)dp.InteractData.ExplorerState).Probabilities[action - 1]; var label = new ContextualBanditLabel(action, -dp.Reward, (prob < 1) ? 1 : ((prob > 0) ? 0 : prob));
var prob = ((GenericTopSlotExplorerState)dp.InteractData.ExplorerState).Probabilities[action - 1];
var label = new ContextualBanditLabel(action, -dp.Reward, (prob < 1) ? 1 : ((prob > 0) ? 0 : prob));
This line causes the problem. I currently solved it by clipping the probability.
var prob = ((GenericTopSlotExplorerState)dp.InteractData.ExplorerState).Probabilities[action - 1];
var label = new ContextualBanditLabel(action, -dp.Reward, (prob < 1) ? 1 : ((prob > 0) ? 0 : prob));