haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.99k stars 1.12k forks source link

coefficients() method in LinearModel is not returning the last coefficient #701

Closed oritush closed 2 years ago

oritush commented 2 years ago

Describe the bug smile.regression.LinearModel#coefficients method in Scala api, returns n-1 coefficients instead of n.

Expected behavior In the t-test matrix, the first columns return the complete array of coefficients (+ the intercept)

Actual behavior The issue is in this code

public double[] coefficients() {
        return bias ? Arrays.copyOfRange(w, 1, w.length - 1) : w;
    }

because the to argument in Arrays.copyOfRange method is exclusive

Code snippet based on your example code, I just added the prints at the end:

object LinearR {
  def main(args: Array[String]): Unit = {

    val x = Array(Array(234.289, 235.6, 159.0, 107.608, 1947, 60.323), Array(259.426, 232.5, 145.6, 108.632, 1948, 61.122), Array(258.054, 368.2, 161.6, 109.773, 1949, 60.171), Array(284.599, 335.1, 165.0, 110.929, 1950, 61.187), Array(328.975, 209.9, 309.9, 112.075, 1951, 63.221), Array(346.999, 193.2, 359.4, 113.270, 1952, 63.639), Array(365.385, 187.0, 354.7, 115.094, 1953, 64.989), Array(363.112, 357.8, 335.0, 116.219, 1954, 63.761), Array(397.469, 290.4, 304.8, 117.388, 1955, 66.019), Array(419.180, 282.2, 285.7, 118.734, 1956, 67.857), Array(442.769, 293.6, 279.8, 120.445, 1957, 68.169), Array(444.546, 468.1, 263.7, 121.950, 1958, 66.513), Array(482.704, 381.3, 255.2, 123.366, 1959, 68.655), Array(502.601, 393.1, 251.4, 125.368, 1960, 69.564), Array(518.173, 480.6, 257.2, 127.852, 1961, 69.331), Array(554.894, 400.7, 282.7, 130.081, 1962, 70.551))

    val y = Array(83.0, 88.5, 88.2, 89.5, 96.2, 98.1, 99.0, 100.0, 101.2, 104.6, 108.4, 110.8, 112.6, 114.2, 115.7, 116.9)

    val data = smile.data.DataFrame.of(x).merge(DoubleVector.of("yyyy", y))
    val formula = Formula.lhs("yyyy")
    val lmp = lm(formula, data)
    println(lmp)

    println("***********************************************************************************************")
    val arr2 = lmp.ttest().map(_(0))
    println(s"********* coefficients in ttest, size=${arr2.size}, ${arr2.mkString(",")}")
    println("***********************************************************************************************")
    println(s"********* intercept() function=${lmp.intercept()} ")
    val arr3: Array[Double] = lmp.coefficients().toArray
    println(s"********* coefficients() function: size=${lmp.coefficients().size},  ${arr3.mkString(",")}")
    println("***********************************************************************************************")
  }
}

the output is:

Linear Model:

Residuals:
       Min          1Q      Median          3Q         Max
   -2.0086     -0.4860      0.1222      0.8777      1.5503

Coefficients:
                  Estimate Std. Error    t value   Pr(>|t|)
Intercept        2946.8564  5647.9766     0.5218     0.6144 
V1                  0.2635     0.1082     2.4367     0.0376 *
V2                  0.0365     0.0302     1.2062     0.2585 
V3                  0.0112     0.0155     0.7222     0.4885 
V4                 -1.7370     0.6738    -2.5779     0.0298 *
V5                 -1.4188     2.9446    -0.4818     0.6414 
V6                  0.2313     1.3039     0.1774     0.8631 
---------------------------------------------------------------------
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.1946 on 9 degrees of freedom
Multiple R-squared: 0.9926,    Adjusted R-squared: 0.9877
F-statistic: 202.5094 on 7 and 9 DF,  p-value: 4.426e-09

***********************************************************************************************
********* coefficients in ttest, size=7, 2946.856360185241,0.26352724693255436,0.03648291369200313,0.011161050494472769,-1.7370298379331879,-1.418798526698675,0.2312878507642465
***********************************************************************************************
********* intercept() function=2946.856360185241 
********* coefficients() function: size=5,  0.26352724693255436,0.03648291369200313,0.011161050494472769,-1.7370298379331879,-1.418798526698675
***********************************************************************************************
haifengl commented 2 years ago

It is by design and the javadoc states so clearly.

oritush commented 2 years ago

the javadoc says that Returns the linear coefficients without intercept. but besides the intercept, the last coefficient is also missing, as seen in the example I sent.

haifengl commented 2 years ago

Fixed. thanks!