My planned formulation was as follows (the implementation does not follow this):
U(p+a) = softmax(d_OB+d_EB+dS2R), where dOB = OBa_i - OBp, and similarly dEB and dS2R
OB_x is the number of OB sentences in document x, as defined by Chaparro et al.
Currently the utility calculation is not part of the learning process. Is this what we want
We could make the evaluation more limited e.g., evaluate only whether the correct answer is chosen given the question and the post, i.e., p(a|p,q). Argue utility is separate and will be evaluated only by a user study.
Things to look into: