universal estimator exp-1

Let f(d) be a one dimensional function, that returns a samples drawn from a univariate distribution (e.g., log-normal)

Generate an input sample (256 observations) using f, e.g: sample = f(d=0.92, size=256).
estimator(f, sample) is a function which learns the parameter d of f from the sample.
- Repeat:
  - Generate synthetic samples using f(d) where d is drawn from ~uniform(search_space)
  - Learn a DNN model and predict the parameter d_pred on the input sample.
  - Record: d_pred, pred_params, test_params (for each iteration).
  - Narrow the search_space around d_pred.
  - Stop when the search_space is small enough (e.g: 1/128 of the original)

Research question: Does the prediction error on the test-set gets lower when the parameter sampling space is smaller?

Specifically, let: e = (pred_params - test_params)

Does STD(e) decreases as the search_space gets smaller?

Plot a graph:

x: iteration #
y1: sigma_1 = STD(pred_params - test_params)
y2: abs(d_pred - d_true)

(shaded) intervals around sigma_1:

3 * sigma_1

sigma_2:

mu = mean(e)
var_2 = 1/n * sum( ( (e - mu) ^ 2 - (sigma_1) ^ 2 ) ^ 2 )
sigma_2 = sqrt(var_2)

Results:

There are two experiments:

[exp-1a]: d ~ uniform(search_space)
[exp-1b]: d ~ norm(search_space)

The experiments were run with: d_true = 0.92

exp-1a

The error sigma_1 keeps decreasing as we narrow the search-space.

abs(d_pred - d_true) stabelizes (less flactuations) as we narrow the search-space.

The model predictions pred_params in the last iteration are WITHIN the (narrowest) search space. search_space: [0.8314 0.8396] pred_params: [0.8357, 0.836 , 0.8354, 0.8353, 0.836 , 0.836 , 0.8359, 0.836 , 0.8352, 0.8359, 0.836 , 0.835 , 0.8359, 0.8334, 0.8355, 0.8348, 0.836 , 0.8352, 0.8358, 0.8342, 0.8359, 0.836 , 0.8355, 0.8353, ...

plot-1a universal-estimator-exp-1a

exp-1b

sigma_1 DOES NOT decrease as we narrow the search-space.

The model predictions (pred_params) in the last iteration are OUT OF the (narrowest) search. search_space: [0.9162 0.925 ] pred_params: [0.8993, 0.8095, 0.9517, 0.8373, 0.9313, 0.8635, 0.8682, 0.9157, 0.7621, 0.9063, 0.7607, 0.8484, 0.8684, 0.9252, 0.9151, 0.8094, 0.8165, 0.8161, 0.8547, 0.9161, 0.9251, 0.9072, 0.8799, 0.9236, ...

plot-1b universal-estimator-exp-1b

Discussion: Based on this post fisher-information-of-log-normal-distribution:

The Fisher Information for log-normal distribution is: I(θ,n)=n/(2θ^2)

For (θ=0.92, n=256) we get I(θ=0.92, n=256) = ~151. Thus, the Cramér–Rao bound on an unbiased estimator variance is: var = 1/151 = ~ 0.0066225 and STD = sqrt(var) = ~ 0.081378

This holds for a single log-normal sample with n=256 observations.

For N samples, the Cramér–Rao bound is: 1/I(θ,N*n)=(2θ^2)/(N*n) For (θ=0.92, n=256, N=1000) we get: STD = ~0.002571

As can be seen in exp-1a, we get STD = 0.0010 which is below the bound !?

yossi-cohen / preferential-attachment

universal estimator exp-1 #11