Open SandyaS72 opened 9 years ago
TL;DR: Particular formalism aside, loss tells you how badly your model does at making a decision for a particular sample x of a random variable; risk tells you how badly your model does at making decisions overall when the samples are drawn from p, a particular random variable in your model class. (Hence the reason why R = E(L) (rough notation) makes sense as a first choice.)
My understanding is as follows:
Sample space (X): where realizations (samples) of the model random variables live, i.e. the codomain of each random variable in your model. Model (P): parameterized set of random variables, each of which has X as codomain. (Recall: a random variable is a map from the "great sample space in the sky" -> X.) Action space (A): set of all possible answers to the particular question you are asking about a particular element of X, which is a particular realization of a particular random variable in your model. Decision rule class (F = { f : X -> A }): set of all ways that your protocol could make decisions about how to assign an action to a particular element of X.
With those in mind, we then have:
Loss functional (L : X \times A \times ... -> I \subset R): a function that assesses how "good" or "bad" your decision protocol performs for a particular realization of a random variable from your model, given the true best action. So, for example, if your protocol chooses a particular decision rule f \in F, you're probably going to want L(x, a) = 0 <=> f(x) = a (Aside: it seems to me like the decision rule class should be part of the domain for L, since the particular choice of decision rule f matters for the "loss" experienced by the protocol for a particular x \in X, as the protocol is going to use f(x) as its estimate for a.)
Risk functional (R : P \times ... -> I \subset R): a function that assesses how "good" or "bad" your entire protocol is at handling samples drawn from a particular random variable in your model set. For example, an interesting finding might be that if P = {all SBM RVs} U {all ER RVs}, you prove that, for your protocol, R(x) <= R(y) whenever x \in {all SBM RVs} and y \in {all ER RVs}. Or, if P = {all SBM RVs}, you might prove that, for your protocol, R(x) <= R(y) whenever x's beta and y's beta parameters have some relation. This is one end goal for decision theory: to show that a particular way of making decisions works "better overall" by some metric when the randomness of its input is of a particular "kind". To do this, you have to be able to make a judgment of how good your protocol is not just for a particular sample, but for the entire random variable.
Some reasonable risk functions I can think of off the top of my head is: minimum of the loss function, median of the loss function, variance of the loss function, interquartile range of the loss function, mode of the loss function.
Something like the minimum of the loss function makes sense to me if you own a bunch of investments, and for whatever reason, you can't afford to lose more than $x under any circumstances.
I think it can literally be any quantity as long as it makes some reasonable sense within the context of the problem.
Can someone explain what risk functional is again and give a few examples? I know we talked about the expected value of the loss function, but what else can it be defined as?