nicholas-denis / ppi-testing

0 stars 0 forks source link

Experiment 1: data drift between model training distribution and PPI inference distribution #2

Open nicholas-denis opened 1 month ago

nicholas-denis commented 1 month ago

Please see the repo wiki for notation and other information.

In this experiment we will have P_ell = P_u, but NOT equal to the initial distribution used to generate training data for f.

Train the ML model from one distribution (example, one gamma distribution) and then do PPI with P_u = P_ell = some other gamma distribution.

Look at how PPI performs as a function of how "far" the models differ.

We want to have several initial distributions (example different gamma distributions) and for each initial distribution, like 100 different PPI distribution for P_ell = P_u.

Do one set of experiments when x is univariate, and y = f(x), for some simple linear function f. Then do one set of experiments when x is a vector, say 10 dimensional, and y = f(x) for some simple linear function f. Then do one set of experiments when x is a vector and y = f(x) for a non-linear function

Aspiire commented 1 month ago

Informative plots have mostly been completed, however, I do not have the correct choice of measure of difference of data distributions. The current candidates are TV distance, and Wasserstein distance. Computing this should not be too numerically demanding.