Open nicholas-denis opened 1 month ago
Informative plots have mostly been completed, however, I do not have the correct choice of measure of difference of data distributions. The current candidates are TV distance, and Wasserstein distance. Computing this should not be too numerically demanding.
Please see the repo wiki for notation and other information.
In this experiment we will have P_ell = P_u, but NOT equal to the initial distribution used to generate training data for f.
Train the ML model from one distribution (example, one gamma distribution) and then do PPI with P_u = P_ell = some other gamma distribution.
Look at how PPI performs as a function of how "far" the models differ.
We want to have several initial distributions (example different gamma distributions) and for each initial distribution, like 100 different PPI distribution for P_ell = P_u.
Do one set of experiments when x is univariate, and y = f(x), for some simple linear function f. Then do one set of experiments when x is a vector, say 10 dimensional, and y = f(x) for some simple linear function f. Then do one set of experiments when x is a vector and y = f(x) for a non-linear function