raacampbell / notBoxPlot

plots things that are not boxplots
GNU Lesser General Public License v3.0
57 stars 22 forks source link

Error in conf. intervals? #6

Closed decastrof closed 7 years ago

decastrof commented 7 years ago

Comparing the confidence intervals returned by notBoxPlot and those from 'coefCI' applied to a model fitted with 'fitlm' they don't match. Not at all. Any suggestions? I include my data and code below. The groups have normal distribution (lilliefors test).

clear all resp= [0.084 -0.1 -0.202 -0.267 0.025 -0.58 -0.782 -1.149 0.021 -0.962]'; treat= {'D','D','D','D','D','F','F','F','F','F'}'; D= table(resp,treat); M2= fitlm(D, 'resp ~ 1 + treat'); coefCI(M2,0.05) ans = -0.43756 0.25356 -1.0871 -0.10971

x =[ 1 1 1 1 1 2 2 2 2 2]; notBoxPlot(D.resp,x)

raacampbell commented 7 years ago

I think difference is because notBoxPlot calculates the 95% CI of each group separately whereas your model bases each CI on the variance of all the groups together.

Here's your model with the data as you sent me:

Linear regression model:
    resp ~ 1 + treat

Estimated Coefficients:
                   Estimate      SE        tStat       pValue 
                   ________    _______    ________    ________

    (Intercept)     -0.092     0.14985    -0.61395      0.5563
    treat_F        -0.5984     0.21192     -2.8237    0.022366

Number of observations: 10, Error degrees of freedom: 8
Root Mean Squared Error: 0.335

I re-define resp to make a couple of values in only the second group larger:

 resp= [0.084 -0.1 -0.202 -0.267 0.025 -0.58 -0.782 -1.149 1.021 -2.962]';

Now re-fit the model:


Linear regression model:
    resp ~ 1 + treat

Estimated Coefficients:
                   Estimate      SE        tStat     pValue 
                   ________    _______    _______    _______

    (Intercept)     -0.092     0.45297    -0.2031    0.84413
    treat_F        -0.7984      0.6406    -1.2463     0.2479

Number of observations: 10, Error degrees of freedom: 8
Root Mean Squared Error: 1.01

The SE for the intercept term has also gone up. So if I fit your original model with only the first variable I get:

Linear regression model:
    resp ~ 1 + treat

Estimated Coefficients:
                   Estimate       SE        tStat     pValue 
                   ________    ________    _______    _______

    (Intercept)    -0.092      0.066126    -1.3913    0.23653

Number of observations: 5, Error degrees of freedom: 4

The CI of that is:

>> coefCI(M3,0.05) 

ans =

   -0.2756    0.0916

Which is the same as:

>> [mean(A)-tInterval_Calc(A),mean(A)+tInterval_Calc(A)]
ans =
   -0.2756    0.0916

That's what notBoxPlot uses if you ask for a CI based on the t interval. By default it uses the normal distribution which gives you:

>> [mean(A)-SEM_calc(A),mean(A)+SEM_calc(A)]            
ans =
   -0.2216    0.0376

Not sure what is the best way forward. Clearly it depends on whether the user wants pooled sum of squares or not. In this case what are you trying to achieve?

raacampbell commented 7 years ago

I have a thought. What if notBoxPlot has a new call style that looks like this:

>> F=fitlm(T,'Var1 ~ 1+Var2');
>> notBoxPlot(F) 

The raw data are in F.Variables, so everything is possible with just the fit object and it will be really obvious that notBoxPlot is doing something different to what it normally does.

decastrof commented 7 years ago

Thanks Rob. Very thorough. Actually, your suggestion would be perfect. To have a call style that matches the model already fitted, so the figure shows the same info as the analysis. FdC

raacampbell commented 7 years ago

I'll see if I can find some time soon to do this.

raacampbell commented 7 years ago

Right!

This commit: ea0ad26c5926b37d59f99de5d7d79b6aa3326acc on the dev branch should do what you need. Can you see if it works? Either checkout the dev branch or download the zip from this page. Take a look at the examples by running NBP.linearModelExamples and then see if the new notBoxPlot call scheme works for your data.

If it's good I'll tidy it up and merge it into the main branch.

EDIT: in case it's not obvious, the blue 1SD bar becomes 1.96 SDs when you supply a LinearModel

decastrof commented 7 years ago

It works perfectly with my data. Except if the predictor in the linear model is a cell array. In that case I get an error: Error using double Conversion to double from cell is not possible. Error in notBoxPlot (line 188) x=double(x);

This can be avoided making the predictor categorical: X.Predictor= categorical(X.Predictor);

Thanks! Great work

raacampbell commented 7 years ago

I'll look into the best way to fix this. I want to make sure that coercing the independent variable to categorical doesn't have some other unintended problem relating to the model fit or what the user intended to do.