Accuracy of prediction as function of focus point.

yossigil commented 3 years ago

Repeat the experiment in #15, but this time, compute the relative error in prediction vs the Cramer rao bound.

X axis: Theta, ranging from 0 to 1.
Y axis: Relative prediction error, I.e., for each theta, the absolute value of the prediction error, divided by the CR bound for that theta. Y axis: another plot, the CR bound; perhaps in Log scale. Try to make to lines on the same plot, or if not possible, try to make one graph on top of each other.

yossi-cohen commented 3 years ago

issue-16

Both in log-scale

yossigil commented 3 years ago

In the lower values of theta, very close to zero, there is a singularity. The bound goes to zero, as we approach zero in theta.

I presume that focussing in this area will improve the results. But, of course, there is a limit on how much we can improve the results. We can achieve accuracy which is close to zero indefinitely, for example, if the machine accuracy is 10E-23, it should be clear that even if floating point is used, very close to zero, we will fail.

So, the conjecture is that once we focus on, say the interval of 0.1 to 0.3 we will get better results and better prediction for these values.

It may be a good time to continue with the more general setting in which the entire range of theta is mapped to the -1 to 1 interval with the arc tan transformation

yossi-cohen commented 3 years ago

First, these is the corrected graphs (as you've asked)

For [0.0, 1.0] I get: issue-16-2

For [0.1, 1.0] I get: issue-16-3

For [0.0, 0.3] I get: issue-16-4

For [0.01, 0.3] (10,000 training samples) I get: issue-16-5

For [0.01, 0.3] (now with 100,000 training samples) I get more or less the same: issue-16-5-1

Regarding the range of theta mapped to the -1 to 1 interval with the arc tan transformation, I would first like to talk with you to understand how exactly to perform the experiment (the domain of lognormal is [0, ...])

yossigil commented 3 years ago

First, these is the corrected graphs (as you've asked)

Two corrections to make the graphs look clearer:

In the ordinary, non-log graph, the Y-scale is so large, you cannot see the horizontal line. Trim Ymax, perhaps manually, or use two calibers for the Y axis.
In the logarithmic graph, the scale is OK (naturally), but the origin is not. The Y axis does not cross the X axis at the point x=0.

(Try to avoid using Python auto selection of min, max and division. Try to use two scales for the Y axis; it is possible. )

For [0.0, 1.0] I get:

Due to scale, only the logarithmic graphs are useful in what follows. I ignore the non-log graphs. Further more, a graph with 10,000 points is a sub graph of the graph with 100,000 points, so there is no need to examine the one with 10,000 points if you have a 100,000, points.

For [0.1, 1.0] I get:

For [0.0, 0.3] I get:

For [0.01, 0.3] (10,000 training samples) I get:

For [0.01, 0.3] (now with 100,000 training samples) I get more or less the same:

Regarding the range of theta mapped to the -1 to 1 interval with the arc tan transformation, I would first like to talk with you to understand how exactly to perform the experiment (the domain of lognormal is [0, ...])

yossi-cohen commented 3 years ago

After:

Adding grid on graph (issue #20)
Multiple scales on the Y axis (issue #19)
Fix log scale diagrams issue #18, I'm not sure I follow you: (the tick markers on the Y axis of the right graph are in log scale)

For [0.01, 1.0] (1,000 training samples): issue-16_0 01-1

For [0.01, 0.3] (1,000 training samples): issue-16_0 01-0 3

yossigil commented 3 years ago

I observe some points:

In the full range [0,1] we can learn with reasonable accuracy across almost the entire range.
Reasonable is say, 1-5 times the CR bound.
Moreover, we can know which part is off scale to the learner, and we can focus on this part.
Not withstanding, we can focus on any part.
Not withstanding, the results of focus on the smaller range [0,0.3] look buggy to me; as I read it, there is a range in [0.2,0.3] where we consistently beat the CR bound.

So, to answer the question better of this issue:

[ ] Double check that the CR bound is computed correctly, shouldn't it be proportional to theta^-2? Or, perhaps to theta^-1? The red curve suggests something else
[ ] Trim, in the non-logarithmic graph, the Y scale at, say 10 times CR. When the results are >10CR, then the log scale diagram should help.
[ ] Enlarge the diagrams, so that we see if we consistently beat the CR bound in any range; beating the CR bound by factor of 5 seems impossible, and it looks as if it does happen. To do so, make sure that the line of MSE/CR = 1, is at about a third of the height.
[ ] Double check that you are using the MSE, and not the MAE; MAE seems to show on the legend?
[ ] Double check the position of the MSE/CR=1 in the log and non-log graphs.

yossigil commented 3 years ago

https://www.wisdom.weizmann.ac.il/~yonina/YoninaEldar/Info/sing-fim.pdf

yossigil commented 3 years ago

Try this in Wolfram Alpha: there is a singularity of log-normal at the point theta=0; this is to be expected; don't worry about it.

p[t,x] := E^(- 2(Log@x/t)^2) / (x t Sqrt[2 * Pi]) Plot[{p[0.24x],p[0.25,x],p[0.2,x],p[0.3,x],p[0.4],p[0.5,x],p[0.92,x]}, {x, 0.1, 2},Filling->None, PlotRange->Full, PlotLabels->Placed[{0.24,0.25,0.2},Right]]

The singularity means that as theta approaches zero, the distribution is so peaked that all values are near 1. In particular, if you make theta as small as 0.1, the value it computes are very very close to 1. How close? Probably smaller than the machine accuracy.

This means that when theta is 0.1, we have no chance of learning, not because our algorithm is incorrect, but because the accuracy of the underlying machine may fail us.

yossi-cohen commented 3 years ago

You've wrote:

Not withstanding, the results of focus on the smaller range [0,0.3] look buggy to me; as I read it, there is a range in [0.2,0.3] where we consistently beat the CR bound.

I do not see it. From looking at the log scale (right plot), the blue curve is above the CRB

issue-16_0 01-0 3

yossi-cohen commented 3 years ago

You've wrote:

Double check that the CR bound is computed correctly, shouldn't it be proportional to theta^-2? Or, perhaps to theta^-1? The red curve suggests something else

The CR is computed as follows:

n = 256
CR_bound = np.sqrt(2*np.square(true_params)/n)

where true_params are the 1000 theta parameters selected in the range.

for theta = 0.2: CR_bound = sqrt(2*0.2^2/n) = 0.0176 for theta = 0.3: CR_bound = sqrt(2*0.2^2/n) = 0.0265

Looking at the red line of the left plot it seems correct.

issue-16_0 01-0 3

yossi-cohen commented 3 years ago

You've wrote:

Double check that you are using the MSE, and not the MAE; MAE seems to show on the legend?

It neither. It is, as the legend say, abs(error), meaning: np.abs(pred_params - true_params) (a vector of length=1000) (MAE and MSE produce a single number)

yossi-cohen commented 3 years ago

Try this in Wolfram Alpha: there is a singularity of log-normal at the point theta=0; this is to be expected; don't worry about

p[t,x] := E^(- 2(Log@x/t)^2) / (x t Sqrt[2 * Pi]) ...

I'll appreciate your help here (I'm not so familiar with Wolfram Alpha)

That aside, you say:

The singularity means that as theta approaches zero, the distribution is so peaked that all values are near 1. In particular, if >you make theta as small as 0.1, the value it computes are very very close to 1. How close? Probably smaller than the machine >accuracy.

This means that when theta is 0.1, we have no chance of learning, not because our algorithm is incorrect, but because the >accuracy of the underlying machine may fail us.

Here is the output of log-normal for 0.1 (100 observations):

Am I missing something? It doesn't seem to be very close to 1.0

from scipy.stats import lognorm
print( lognorm.rvs(s=0.1, size=100) )

0.9484 0.9406 1.0933 0.9608 1.0525 1.1123 0.9732 0.8586 1.0642 1.0095
 0.9354 0.8667 1.041  0.9297 0.8484 1.0583 0.8145 1.053  0.9049 0.9288
 1.0114 0.9745 1.2416 0.94   1.0392 0.945  1.0864 1.0106 0.8829 1.191
 0.8916 1.0301 0.9635 0.9984 0.883  1.0546 1.0995 1.0282 0.9179 1.1257
 1.0623 1.0214 0.9586 0.8509 1.109  1.045  0.9269 0.9331 0.9369 0.9452
 1.0244 1.0966 1.0511 1.1618 1.0519 1.0157 0.8823 1.1352 0.9891 1.1091
 0.8304 0.9855 1.1583 0.8517 1.12   1.0481 0.9968 1.0782 1.0912 0.9106
 0.9245 1.0486 0.8928 0.9985 0.9715 0.9756 0.9114 0.9802 0.8717 1.073
 1.0991 1.0253 0.9546 0.9529 1.1074 0.9471 0.8943 1.1018 1.0033 0.8268
 0.9466 0.978  0.9371 0.9665 0.9393 0.9242 1.0208 1.009  1.2042 1.0049]

and here is a histogram of those values: lognormal-0 1

Here is the histogram with theta=0.01: lognormal-0 01

and with theta=0.001: lognormal-0 001

yossi-cohen commented 3 years ago

Limit y between 0 and 5 issue-16_0 01-1-limit-5

yossigil commented 3 years ago

In conclusion of this issue:

The CRB is achieved and surpassed in the majority of the range.
In the singularity point, we have two problems: the accuracy drops, and the performance seems to be cantered at 0.2. These perhaps should be investigated, but we are wasting time on a poorly designed experiment,and we need to move on.
There is also a difficulty in the other end, since there we have strange defeat of CLB. This impossible feat is probably due to hidden information in that it is know that theta is less than 1, which improves the accuracy of the prediction. I am not entirely sure, but I would like to proceed.

yossi-cohen / preferential-attachment

Accuracy of prediction as function of focus point. #16