yanboliang / spark-vlbfgs

Vector-free L-BFGS implementation for Spark MLlib
Apache License 2.0
46 stars 17 forks source link

Add VLogisticRegression intercept support #12

Closed WeichenXu123 closed 7 years ago

WeichenXu123 commented 7 years ago

Add VLogisticRegression intercept support.

The implementation for intercept is similar to the one in spark mllib. The key point is following:

  1. When training, store intercept value in the last element of the coefficients DV.
  2. In VBinomialLogisticCostFun, when calculating margins, fetch intercept value from the last element of the coefficients DV and add it into margin.
  3. When aggregating features vector into gradient, append a virtual column using value 1, and aggregate them using each multiplier.
  4. The Logistic Regression intercept use a computed value as initial value:
    initial_intercept = \log{P(1) / P(0)} = \log{count_1 / count_0}

Note The standardization processing do not include intercept. The L2 regulization do not include intercept.

API change Add blockCoords: (Int, Int) parameter for f in VUtils.blockMatrixHorzZipVec and VUtils.blockMatrixHorzZipVec.