QRDecomposition is slow

lukehutch commented 8 years ago

I discovered that OLS regression of 120k data points with 1200 dimensions per point takes an extremely long time (I'm not sure how long, it has been running for many minutes, with one core at 100% -- much longer than expected). The culprit is QRDecomposition, which has O(n^2) time complexity.

I don't know if this can be improved -- the following doc may help: https://www.math.kth.se/na/SF2524/matber15/qrmethod.pdf

Or maybe OLS can be performed in some other way that doesn't use QRDecomposition? Or is there an approximate / SGD-based way of performing approximate OLS regression?

If not, QRDecomposition can be parallelized to some degree, that would help.

lukehutch commented 8 years ago

This might be useful as another, faster least squares regression method: http://papers.nips.cc/paper/5105-new-subsampling-algorithms-for-fast-least-squares-regression.pdf

haifengl commented 8 years ago

120k x 1200 is a big matrix, which takes a lot of memory. How much ram did you allocate to jvm. Maybe it is GC's problem. Btw, did you try this data on R? Thanks!

lukehutch commented 8 years ago

I allocated 30GB to the JVM, and I have 32GB of RAM. It's not a GC issue as far as I can tell, I stopped the code in the debugger, and it's running in QRDecomposition.java, which doesn't seem to allocate new objects in its loops, as far as I can see.

I didn't try this on R, but I have successfully run linear regression on this dataset in SciKit-Learn (Python / C++), and it returns the result pretty quickly (I think it takes 10 or 20 seconds).

lukehutch commented 8 years ago

The QRDecomposition is O(n^2 . m), which gives ~120000 . 120000 . 1200 = 17.3T operations.

haifengl commented 8 years ago

Thanks! It is not a memory issue given your settings. Can you share your data for me to debug?

lukehutch commented 8 years ago

Sure, how should I send it to you? I can send you a file in LibSVM format.

haifengl commented 8 years ago

Our QR implementation is a standard algorithm. I am afraid that we may got trapped into numeric stability problems, for example NaN values. Scikit probably uses lapack for QR.

haifengl commented 8 years ago

Can you send the link to my email haifeng.hli@gmail.com? Thanks!

haifengl commented 8 years ago

Thanks Luke for sharing data. I ran OLS on this data, very quickly get the error that the matrix is deficient:

smile> ols(x,y) java.lang.RuntimeException: Matrix is rank deficient. at smile.math.matrix.QRDecomposition.solve(QRDecomposition.java:272) at smile.regression.OLS.(OLS.java:174) at smile.regression.Operators$$anonfun$ols$1.apply(Operators.scala:78) at smile.regression.Operators$$anonfun$ols$1.apply(Operators.scala:78) at smile.util.package$time$.apply(package.scala:49) at smile.regression.Operators$class.ols(Operators.scala:77) at smile.regression.package$.ols(package.scala:30) ... 42 elided

haifengl commented 8 years ago

I think that this is the right response. Does scikit return a result without any warnings?

haifengl commented 8 years ago

I tried this data on R:

lm(y ~ x)

Call: lm(formula = y ~ x)

Coefficients: (Intercept) xV2 xV3 xV4 xV5 xV6
-1.687e-01 4.095e-01 -9.897e-02 -1.193e-01 1.357e-01 5.209e-01
xV7 xV8 xV9 xV10 xV11 xV12
4.587e-01 6.723e-02 1.514e-01 2.371e+01 NA NA
xV13 xV14 xV15 xV16 xV17 xV18
NA NA -1.943e-02 6.006e+00 7.137e-02 -3.794e-02
xV19 xV20 xV21 xV22 xV23 xV24
1.277e-01 -1.528e-01 -5.212e-01 NA NA NA
xV25 xV26 xV27 xV28 xV29 xV30
NA -1.252e-01 4.160e-01 -4.567e-01 1.170e+00 2.685e-01
xV31 xV32 xV33 xV34 xV35 xV36
-9.798e-02 2.610e-01 -5.236e-01 -1.917e+00 -1.232e+01 -1.115e+03
xV37 xV38 xV39 xV40 xV41 xV42
NA NA 1.788e+00 5.770e-01 2.120e-01 2.736e-02
xV43 xV44 xV45 xV46 xV47 xV48
-6.194e-02 -3.168e-01 -2.081e+00 -1.096e-01 3.247e+00 NA
xV49 xV50 xV51 xV52 xV53 xV54
NA NA 2.172e+00 1.996e-01 1.203e-01 -1.242e+00
xV55 xV56 xV57 xV58 xV59 xV60
-1.187e+00 -2.550e+00 -1.243e+00 5.093e+01 NA NA
xV61 xV62 xV63 xV64 xV65 xV66
NA NA -6.792e-02 -2.219e-02 -3.570e-01 -1.781e-01
xV67 xV68 xV69 xV70 xV71 xV72
NA NA NA NA NA NA
xV73 xV74 xV75 xV76 xV77 xV78
NA -3.220e-01 -6.726e-03 2.905e-01 -1.963e-03 1.769e+00
xV79 xV80 xV81 xV82 xV83 xV84
NA NA NA NA NA NA
xV85 xV86 xV87 xV88 xV89 xV90
NA NA NA NA 2.085e-01 -7.914e-01
xV91 xV92 xV93 xV94 xV95 xV96
-8.538e-02 -1.785e+00 NA NA NA NA
xV97 xV98 xV99 xV100 xV101 xV102
NA NA NA NA NA -1.906e-01
xV103 xV104 xV105 xV106 xV107 xV108
-3.237e-02 -7.539e-02 -3.532e-01 4.279e-01 NA NA
xV109 xV110 xV111 xV112 xV113 xV114
NA NA 5.013e-01 7.254e-01 -1.434e+00 -1.580e-01
xV115 xV116 xV117 xV118 xV119 xV120
-4.744e-01 -6.724e-01 -7.261e-01 -8.577e-01 4.295e-01 1.070e+00
xV121 xV122 xV123 xV124 xV125 xV126
7.081e+01 6.526e+00 -5.940e-01 4.334e-01 -7.251e-02 8.535e-02
xV127 xV128 xV129 xV130 xV131 xV132
1.656e-01 4.407e-01 1.001e+00 1.412e+00 1.515e+00 1.643e+00
xV133 xV134 xV135 xV136 xV137 xV138
2.796e+00 NA 1.022e-01 2.268e-02 3.109e-01 4.767e-01
xV139 xV140 xV141 xV142 xV143 xV144
1.287e-01 6.654e-03 -8.654e-02 -1.037e+00 7.940e+00 NA
xV145 xV146 xV147 xV148 xV149 xV150
NA NA -7.921e-01 -7.592e-01 -8.487e-01 -5.844e-01
xV151 xV152 xV153 xV154 xV155 xV156
-5.882e-01 -3.923e-01 -1.601e+00 -1.720e+00 3.941e+01 NA
xV157 xV158 xV159 xV160 xV161 xV162
NA -4.680e-01 -7.047e-01 -5.855e-01 9.691e-02 2.125e+00
xV163 xV164 xV165 xV166 xV167 xV168
7.015e-01 7.910e-01 8.878e-01 1.374e+00 1.204e+00 2.363e+00
xV169 xV170 xV171 xV172 xV173 xV174
-5.009e+00 2.151e+02 7.348e-01 4.341e-01 1.340e-01 1.373e-01
xV175 xV176 xV177 xV178 xV179 xV180
-5.297e-01 -1.394e-01 -3.315e-01 -1.476e+00 -5.088e+00 NA
xV181
NA

haifengl commented 8 years ago

Although R finish the fitting without any error or warnings, many coefficients are NA. The model is essentially useless.

lukehutch commented 8 years ago

Hi Haifeng, thanks for looking into this (sorry for the delayed response). I'm not seeing the "java.lang.RuntimeException: Matrix is rank deficient." message with OLS. With Lasso I see:

Jun 13, 2016 6:38:48 PM smile.regression.LASSO <init>
INFO: LASSO: primal and dual objective function value after   0 iterations: NaN NaN

Jun 13, 2016 6:39:21 PM smile.math.Math solve
INFO: BCG: the error after  10 iterations: NaN
Jun 13, 2016 6:39:54 PM smile.math.Math solve
INFO: BCG: the error after  20 iterations: NaN
Jun 13, 2016 6:40:23 PM smile.math.Math solve
INFO: BCG: the error after  30 iterations: NaN

I don't understand how the data could be rank-deficient for OLS, given that there are many more data points than dimensions, and the data points are mostly unrelated to each other. What am I missing?

haifengl commented 8 years ago

A matrix could be of rank deficient because of collinear (or as simple as some column is zero) even if there are more rows than columns. For OLS, we always have more samples than dimensions (that is why we do least squares). But rand deficient is a common reason of poor fitting for OLS. The easy way to verify is to calculate the singular values (both R and Matlab can do it, smile supports that too). BTW, which version smile, java, and hardware do you use? I am using the master branch (but there is no changes to these basic methods for long time). Thanks!

lukehutch commented 8 years ago

I performed an SVD in NumPy. The singular values are all non-zero, although the smallest 7 values are 1.64e-13, very close to zero -- which I assume is what you mean about the data being rank deficient?

It is possible for some of the columns (some of the instance features) to be colinear, but I didn't know that would cause a problem with regression. Do I need to remove exact-duplicate features from the feature vector before performing regression? Although I think only duplicate rows will create a problem with rank deficiency, so do I need to find exact-duplicate rows first and remove them? If so, isn't it a problem for SMILE that it can't automatically handle exact-duplicate training examples without the numerical methods failing?

Incidentally, NumPy uses LAPACK for SVD, and is significantly faster than SMILE (though I still have never waited long enough for SMILE to complete, so I don't know how long it will take, I always kill it after waiting many minutes for it to complete, since I don't know if it is going to take minutes, hours or years, depending on how the implementation scales with array dimension).

haifengl commented 8 years ago

1.64e-13 is very close to zero, which does indicate rand deficient. you will never get exact 0 singular values on float computing.

duplicate rows are not problems. For least squares, collinear is about columns.

R also uses LAPACK and returns a model with NA parameters, which means the model cannot be applied. Did you observe similar thing with NumPy?

Smile returns quickly on the data you emailed me. I don't why it is very slow for you. Can you provide more information about your system? Such as JVM, smile version, hardware, etc. Thanks!

PS, smile is surely slower than LAPACK for this task. LAPACK is written in FORTRAN and highly optimized with years tuning.

lukehutch commented 8 years ago

Sorry, I forgot to answer the info on the system. I am using a 64-bit Java 8 JRE on a 12-core (hyperthreaded) Xeon E5-1650 v3 @ 3.50GHz. I am using the latest release of SMILE, 1.1.0.

I looked at the data, and there are 7 columns that are all zero, which explains the rank deficiency. However, I don't understand how having linearly-dependent columns should stop regression from working (assuming that by columns, we're both talking about data dimensions, and by rows, we're talking about instances). In fact, I have no problems using either sklearn.linear_model.LinearRegression or sklearn.linear_model.Lasso to get reasonable regression results. Both methods take 30-35 seconds. With the same exact data, I am still getting NaN from SMILE using the corresponding methods. I haven't tried R.

Here is the latest version of my data, which might be slightly different from the version I sent you previously (it may contain a few thousand more training examples). Again the first column is y (the regression target), and the remainder of the columns are the feature matrix X, with features in the columns and data instances in the rows.

https://drive.google.com/file/d/0B3BBfApbFAuZUS05YlpNclJoUXc/view?usp=sharing

As far as SVD speed: np.linalg.svd takes 90 seconds to perform an SVD, using full_matrices=False, so NumPy SVD is 3x slower than SciKit-Learn regression. Yes, I understand that LAPACK should be faster, but based on these numbers alone, it is clear that SciKit-Learn is not performing SVD via NumPy to implement LinearRegression, since LinearRegression finishes 3x faster than SVD. I am timing SMILE's SVD on this data, and it has been running a while, so I need to leave it running overnight as I leave work, will report back. (Progress so far in the outermost loop indicates it may take 40-50 minutes to complete.)

haifengl commented 8 years ago

what is the output of "java -version" on my machine? I am afraid that you are not using oracle jdk.

lukehutch commented 8 years ago

I'm not, I'm using a Google-internal patched version of OpenJDK 8. The actual build number won't mean anything outside that context. Assume that it is (more or less) equivalent to OpenJDK.

How does the JRE version affect the issues I described?

haifengl commented 8 years ago

Well, that explains a lot. OpenJDK JVM misses many optimizations. After removing the zero columns, I can quickly fit the model on my laptop.

haifengl commented 8 years ago

val data = readTable("data", Some((new NumericAttribute("y"), 0))) val (x, y) = data.unzipDouble // Print out the summary of data. Obivously many constant columns data.summary

// Column min and max val min = Math.colMin(x) val max = Math.colMax(x)

// filter out constant columns val cols = (0 until min.length).filter { i => min(i) != max(i) }

// Hopefully not rank deficient val x2 = x.col(cols: _*)

val start = System.currentTimeMillis val model = ols(x2, y) val end = System.currentTimeMillis

// eclipsed time (end - start) / 1000.0

haifengl commented 8 years ago

The above is the script I used to fit your data. Here is the model summary:

smile> println(model) Linear Model:

Residuals: Min 1Q Median 3Q Max -11.0241 -0.5573 -0.0010 0.5443 20.1205

Coefficients: Estimate Std. Error t value Pr(>|t|) Intercept 1.4630 0.0575 25.4472 0.0000 Var 1 230.9277 108.2488 2.1333 0.0329 Var 2 0.0186 0.0236 0.7860 0.4318 Var 3 4.5291 0.2927 15.4722 0.0000 ** Var 4 0.0946 0.0106 8.9606 0.0000 Var 5 0.4202 0.0166 25.3531 0.0000 Var 6 -0.2238 0.0210 -10.6375 0.0000 Var 7 0.5990 0.0526 11.3951 0.0000 Var 8 0.0970 0.0246 3.9448 0.0001 Var 9 -0.0258 0.0189 -1.3646 0.1724 Var 10 -0.1041 0.0110 -9.4750 0.0000 Var 11 -0.1278 0.0116 -11.0109 0.0000 Var 12 -0.1323 0.0180 -7.3606 0.0000 Var 13 0.0134 0.0108 1.2426 0.2140 Var 14 -0.1563 0.0091 -17.2506 0.0000 Var 15 -0.0613 0.0090 -6.8372 0.0000 Var 16 -0.0436 0.0110 -3.9687 0.0001 Var 17 -0.0240 0.0110 -2.1884 0.0286 * Var 18 0.0039 0.0103 0.3782 0.7053 Var 19 0.0407 0.0120 3.4028 0.0007 Var 20 0.1061 0.0134 7.9049 0.0000 Var 21 0.0768 0.0157 4.8871 0.0000 Var 22 0.1590 0.0231 6.8770 0.0000 Var 23 0.1866 0.0342 5.4571 0.0000 Var 24 0.2983 0.0479 6.2275 0.0000 Var 25 0.3623 0.0771 4.7010 0.0000 * Var 26 0.3121 0.1118 2.7910 0.0053 Var 27 0.4906 0.2022 2.4264 0.0153 Var 28 1.3943 0.3153 4.4217 0.0000 Var 29 4.0306 5.4185 0.7439 0.4570 Var 30 -2.4711 0.4292 -5.7576 0.0000 * Var 31 0.3044 0.1881 1.6183 0.1056 Var 32 -0.8267 0.6034 -1.3699 0.1707 Var 33 0.9129 0.1837 4.9684 0.0000 Var 34 -0.2590 0.1460 -1.7741 0.0760 . Var 35 1.9595 0.3186 6.1507 0.0000 Var 36 0.2317 0.1276 1.8167 0.0693 . Var 37 0.0137 0.1910 0.0719 0.9427 Var 38 -0.2433 0.1975 -1.2315 0.2181 Var 39 0.0493 0.2362 0.2087 0.8347 Var 40 -0.0459 0.1871 -0.2454 0.8061 Var 41 0.3964 0.1910 2.0753 0.0380 * Var 42 0.0472 0.1351 0.3493 0.7269 Var 43 0.7308 0.1307 5.5927 0.0000 Var 44 -0.4392 0.1029 -4.2701 0.0000 Var 45 0.1889 0.1211 1.5592 0.1189 Var 46 0.4101 0.1855 2.2107 0.0271 Var 47 0.6162 0.2025 3.0422 0.0023 Var 48 0.0153 0.1266 0.1209 0.9038 Var 49 -0.0529 0.1354 -0.3904 0.6962 Var 50 0.1421 0.1263 1.1256 0.2603 Var 51 0.4159 0.1354 3.0715 0.0021 Var 52 0.0498 0.1262 0.3945 0.6932 Var 53 -0.1372 0.1495 -0.9173 0.3590 Var 54 0.0119 0.1607 0.0739 0.9411 Var 55 0.2754 0.1737 1.5851 0.1129 Var 56 -0.1507 0.1439 -1.0473 0.2950 Var 57 -0.8649 0.3184 -2.7164 0.0066 Var 58 0.1269 0.2017 0.6294 0.5291 Var 59 -0.1398 0.2252 -0.6209 0.5347 Var 60 -0.0274 0.1282 -0.2136 0.8308 Var 61 -0.7757 0.2187 -3.5469 0.0004 Var 62 -0.4186 0.1377 -3.0409 0.0024 Var 63 -0.2772 0.2431 -1.1403 0.2542 Var 64 -0.3090 0.2418 -1.2778 0.2013 Var 65 -0.5493 0.3094 -1.7756 0.0758 . Var 66 -0.4370 0.3173 -1.3775 0.1684 Var 67 18.4904 11.2558 1.6427 0.1004 Var 68 0.2496 0.0392 6.3617 0.0000 ** Var 69 -0.2817 0.0505 -5.5754 0.0000 Var 70 0.5768 0.0780 7.3919 0.0000 Var 71 48.0317 8.6565 5.5486 0.0000 Var 72 0.2642 0.0484 5.4636 0.0000 * Var 73 -0.0553 0.0206 -2.6823 0.0073 * Var 74 -0.0891 0.0182 -4.9116 0.0000 Var 75 0.1692 0.0276 6.1364 0.0000 Var 76 -0.0494 0.0144 -3.4413 0.0006 Var 77 0.0170 0.0115 1.4810 0.1386 Var 78 -0.1324 0.0105 -12.6250 0.0000 Var 79 -0.1510 0.0190 -7.9306 0.0000 Var 80 0.0788 0.0215 3.6670 0.0002 Var 81 0.2407 0.0223 10.7775 0.0000 Var 82 -0.0989 0.0141 -7.0159 0.0000 Var 83 0.1531 0.0173 8.8619 0.0000 Var 84 0.0939 0.0120 7.8522 0.0000 Var 85 0.0296 0.0117 2.5265 0.0115 Var 86 0.0494 0.0099 4.9773 0.0000 ** Var 87 0.0654 0.0120 5.4343 0.0000 Var 88 0.1052 0.0139 7.5438 0.0000 Var 89 0.0952 0.0131 7.2738 0.0000 Var 90 0.0286 0.0146 1.9536 0.0508 . Var 91 0.1620 0.0204 7.9450 0.0000 * Var 92 0.0601 0.0188 3.1878 0.0014 Var 93 0.0622 0.0204 3.0470 0.0023 Var 94 0.1048 0.0324 3.2308 0.0012 Var 95 -0.0966 0.0551 -1.7539 0.0795 . Var 96 -0.0180 0.0821 -0.2188 0.8268 Var 97 -0.1836 0.1612 -1.1394 0.2545 Var 98 -0.3031 0.2209 -1.3721 0.1700 Var 99 -0.8797 0.3451 -2.5487 0.0108 Var 100 -9.1354 1.8460 -4.9487 0.0000 Var 101 1.5603 0.0298 52.3572 0.0000 * Var 102 2.2920 0.1395 16.4259 0.0000 Var 103 0.1980 0.0187 10.5782 0.0000 Var 104 0.2153 0.0255 8.4461 0.0000 Var 105 0.4177 0.0172 24.2909 0.0000 Var 106 0.2372 0.0173 13.7261 0.0000 Var 107 0.1776 0.0138 12.8671 0.0000 Var 108 3.0448 0.1111 27.4046 0.0000 Var 109 -0.1091 0.0426 -2.5630 0.0104 Var 110 -0.5512 0.1100 -5.0130 0.0000 ** Var 111 -0.0864 0.0184 -4.6837 0.0000 Var 112 0.1986 0.0177 11.2308 0.0000 Var 113 -0.0755 0.0167 -4.5089 0.0000 Var 114 0.0537 0.0151 3.5641 0.0004 Var 115 0.0725 0.0121 5.9919 0.0000 Var 116 -0.0887 0.0124 -7.1465 0.0000 Var 117 -0.0458 0.0137 -3.3576 0.0008 Var 118 -0.0959 0.0131 -7.3153 0.0000 * Var 119 -0.0368 0.0121 -3.0443 0.0023 * Var 120 -0.0661 0.0124 -5.3215 0.0000 Var 121 -0.0171 0.0127 -1.3530 0.1760 Var 122 0.1453 0.0130 11.1819 0.0000 * Var 123 -0.0517 0.0160 -3.2402 0.0012 * Var 124 -0.1903 0.0194 -9.8177 0.0000 Var 125 -0.2861 0.0253 -11.2913 0.0000 Var 126 -0.3542 0.0373 -9.4951 0.0000 Var 127 -0.6925 0.0590 -11.7327 0.0000 Var 128 -0.6490 0.1162 -5.5866 0.0000 Var 129 -0.6553 0.1349 -4.8577 0.0000 Var 130 1.5980 1.7018 0.9390 0.3477 Var 131 2.4145 0.4818 5.0117 0.0000 Var 132 0.7843 0.0125 62.6378 0.0000 Var 133 3.0597 0.0649 47.1689 0.0000 Var 134 -0.0293 0.0148 -1.9760 0.0482 * Var 135 0.1119 0.0122 9.1996 0.0000 Var 136 0.2394 0.0122 19.6213 0.0000 Var 137 0.1659 0.0605 2.7405 0.0061 Var 138 441.4116 1872.3619 0.2358 0.8136 Var 139 -0.4209 0.0770 -5.4684 0.0000 ** Var 140 0.1779 0.0180 9.8676 0.0000 Var 141 0.1103 0.0267 4.1280 0.0000 Var 142 -0.0197 0.0248 -0.7966 0.4257 Var 143 -0.0316 0.0177 -1.7888 0.0736 . Var 144 0.2720 0.0144 18.9379 0.0000 Var 145 0.0618 0.0118 5.2620 0.0000 Var 146 0.0758 0.0092 8.2108 0.0000 Var 147 0.0217 0.0100 2.1643 0.0304 * Var 148 0.0646 0.0104 6.2131 0.0000 Var 149 0.0454 0.0104 4.3473 0.0000 Var 150 -0.0998 0.0109 -9.1751 0.0000 Var 151 -1.0689 0.0534 -20.0273 0.0000 Var 152 -0.3062 0.0191 -16.0317 0.0000 Var 153 -0.2395 0.0163 -14.7179 0.0000 Var 154 -0.3579 0.0203 -17.6098 0.0000 Var 155 -0.3377 0.0204 -16.5471 0.0000 Var 156 -0.2483 0.0282 -8.8199 0.0000 Var 157 -0.3350 0.0346 -9.6873 0.0000 Var 158 -0.3729 0.0326 -11.4338 0.0000 Var 159 -0.6318 0.0375 -16.8518 0.0000 Var 160 -0.6944 0.0481 -14.4240 0.0000 Var 161 -0.8661 0.0575 -15.0715 0.0000 Var 162 -0.8338 0.0762 -10.9388 0.0000 Var 163 -0.5674 0.1115 -5.0886 0.0000 Var 164 -0.7079 0.1957 -3.6169 0.0003 Var 165 -1.0854 0.2257 -4.8101 0.0000 Var 166 1.3247 2.2573 0.5868 0.5573 Var 167 -1.3093 3.6342 -0.3603 0.7186 Var 168 -0.8052 0.2481 -3.2460 0.0012 Var 169 -0.5214 0.3825 -1.3633 0.1728 Var 170 0.5448 0.6787 0.8027 0.4221 Var 171 -0.3114 0.1789 -1.7409 0.0817 . Var 172 -0.5382 0.5066 -1.0624 0.2881 Var 173 -0.2431 1.1207 -0.2169 0.8283 Var 174 1.5874 1.5090 1.0520 0.2928 Var 175 2.0706 0.6458 3.2061 0.0013 Var 176 -55.3814 34.2143 -1.6187 0.1055 Var 177 0.3192 0.2837 1.1251 0.2605 Var 178 0.2467 1.3413 0.1839 0.8541 Var 179 -0.4304 0.1881 -2.2877 0.0222 Var 180 -0.0326 0.5341 -0.0611 0.9513 Var 181 -0.8445 5.5658 -0.1517 0.8794 Var 182 -0.1187 0.5394 -0.2200 0.8259 Var 183 0.1852 0.2832 0.6540 0.5131 Var 184 -1159.0040 711.3195 -1.6294 0.1032 Var 185 -0.2962 0.7520 -0.3939 0.6936 Var 186 -0.5719 0.2249 -2.5427 0.0110 Var 187 -0.3098 0.1846 -1.6779 0.0934 . Var 188 0.0382 0.1086 0.3521 0.7248 Var 189 -0.2124 0.1156 -1.8367 0.0663 . Var 190 -0.4035 0.0978 -4.1248 0.0000 Var 191 -0.0707 0.1302 -0.5428 0.5872 Var 192 -0.1736 0.1471 -1.1799 0.2380 Var 193 -0.3062 0.1828 -1.6748 0.0940 . Var 194 0.9217 0.2142 4.3029 0.0000 Var 195 0.0610 0.1757 0.3474 0.7283 Var 196 -0.3460 0.1918 -1.8045 0.0712 . Var 197 0.1925 0.1679 1.1466 0.2516 Var 198 -0.0972 0.0989 -0.9831 0.3256 Var 199 0.1409 0.1180 1.1937 0.2326 Var 200 -0.0310 0.1227 -0.2530 0.8003 Var 201 -0.1010 0.1536 -0.6574 0.5109 Var 202 -0.0257 0.1462 -0.1756 0.8606 Var 203 0.0234 0.1887 0.1242 0.9012 Var 204 -0.1218 0.1353 -0.8998 0.3683 Var 205 -0.0364 0.2081 -0.1749 0.8612 Var 206 0.1307 0.0925 1.4128 0.1577 Var 207 -0.4160 0.1000 -4.1594 0.0000 Var 208 0.1155 0.0846 1.3664 0.1718 Var 209 0.1285 0.1026 1.2524 0.2104 Var 210 0.0831 0.1033 0.8049 0.4209 Var 211 -0.0335 0.1044 -0.3205 0.7486 Var 212 0.1520 0.1014 1.4996 0.1337 Var 213 0.1752 0.1267 1.3823 0.1669 Var 214 0.2044 0.1138 1.7970 0.0723 . Var 215 0.5428 0.1272 4.2668 0.0000 Var 216 0.3291 0.1247 2.6383 0.0083 Var 217 0.1690 0.1077 1.5693 0.1166 Var 218 0.7193 0.1146 6.2788 0.0000 ** Var 219 0.4295 0.1779 2.4145 0.0158 Var 220 0.4494 0.2797 1.6065 0.1082 Var 221 -6.9510 8.3400 -0.8335 0.4046 Var 222 1.6754 0.5784 2.8965 0.0038 Var 223 0.0543 0.1476 0.3678 0.7130 Var 224 -1.5624 0.3013 -5.1861 0.0000 Var 225 -31.9835 17.7456 -1.8023 0.0715 . Var 226 2.5167 0.3881 6.4839 0.0000 Var 227 -0.0323 0.1221 -0.2644 0.7915 Var 228 0.2823 0.2375 1.1883 0.2347 Var 229 0.2146 0.3054 0.7025 0.4823 Var 230 -0.1881 0.3898 -0.4826 0.6294 Var 231 -0.0879 0.1535 -0.5724 0.5671 Var 232 -0.0834 0.1734 -0.4809 0.6306 Var 233 0.0958 0.0929 1.0314 0.3023 Var 234 -0.4702 0.2552 -1.8422 0.0655 . Var 235 0.7133 0.1965 3.6300 0.0003 * Var 236 -0.2968 0.2353 -1.2612 0.2072 Var 237 0.0221 0.1503 0.1474 0.8828 Var 238 0.2132 0.1862 1.1447 0.2523 Var 239 -0.0628 0.1356 -0.4628 0.6435 Var 240 0.4104 0.2727 1.5048 0.1324 Var 241 -0.2650 0.1352 -1.9600 0.0500 . Var 242 -0.3257 0.2213 -1.4722 0.1410 Var 243 -0.0563 0.1396 -0.4033 0.6868 Var 244 -1.1349 0.3723 -3.0481 0.0023 * Var 245 -0.4815 0.1319 -3.6508 0.0003 Var 246 10.8584 3.1042 3.4979 0.0005 Var 247 -6.0321 4.1645 -1.4484 0.1475 Var 248 -0.5189 0.2378 -2.1821 0.0291 Var 249 0.3834 0.4551 0.8425 0.3995 Var 250 -0.9750 6.4835 -0.1504 0.8805 Var 251 -0.6839 0.3298 -2.0740 0.0381 Var 252 0.4169 0.1048 3.9784 0.0001 Var 253 0.2494 0.2587 0.9639 0.3351 Var 254 0.0720 0.2986 0.2412 0.8094 Var 255 0.6008 0.4818 1.2470 0.2124 Var 256 -0.5119 0.2466 -2.0755 0.0379 * Var 257 -0.0314 0.3465 -0.0907 0.9277 Var 258 -0.3122 0.1144 -2.7298 0.0063 Var 259 0.1446 0.3077 0.4699 0.6384 Var 260 -0.5283 0.2098 -2.5179 0.0118 Var 261 -0.1913 0.3501 -0.5466 0.5847 Var 262 -0.4490 0.2237 -2.0070 0.0448 Var 263 0.2557 0.2555 1.0010 0.3168 Var 264 -0.0154 0.2094 -0.0737 0.9412 Var 265 0.0024 0.2906 0.0082 0.9934 Var 266 0.0753 0.3338 0.2255 0.8216 Var 267 0.4456 0.4410 1.0102 0.3124 Var 268 -1.5315 0.1921 -7.9724 0.0000 ** Var 269 -0.2350 0.3179 -0.7392 0.4598 Var 270 -0.0970 0.1377 -0.7041 0.4814 Var 271 -0.1391 0.2757 -0.5044 0.6140 Var 272 -0.1408 0.1119 -1.2579 0.2084 Var 273 0.1948 0.2798 0.6960 0.4864 Var 274 -0.3549 0.3213 -1.1046 0.2694 Var 275 -0.1932 0.2331 -0.8286 0.4073 Var 276 -0.1007 0.1655 -0.6084 0.5429 Var 277 -0.0337 0.3520 -0.0957 0.9238 Var 278 -0.1240 0.1788 -0.6932 0.4882 Var 279 0.0295 0.2764 0.1069 0.9149 Var 280 0.2398 0.2116 1.1335 0.2570 Var 281 -1.4096 7.4958 -0.1880 0.8508 Var 282 -5.5626 0.8175 -6.8044 0.0000 Var 283 0.1344 0.0217 6.1974 0.0000 Var 284 0.5342 0.0217 24.5877 0.0000 Var 285 -0.1078 0.0124 -8.6590 0.0000 Var 286 0.0160 0.0134 1.1940 0.2325 Var 287 0.0126 0.0095 1.3290 0.1839 Var 288 0.0032 0.0242 0.1337 0.8936 Var 289 0.1678 0.0101 16.5911 0.0000 Var 290 0.2130 0.0085 24.9566 0.0000 Var 291 0.2400 0.0122 19.6594 0.0000 Var 292 0.1675 0.0093 18.0783 0.0000 Var 293 0.0631 0.0086 7.3184 0.0000 Var 294 0.0748 0.0081 9.2340 0.0000 Var 295 0.0455 0.0079 5.7226 0.0000 Var 296 0.0611 0.0081 7.5534 0.0000 Var 297 -0.0386 0.0099 -3.8817 0.0001 Var 298 0.0437 0.0090 4.8729 0.0000 Var 299 0.1022 0.0091 11.1746 0.0000 Var 300 0.0101 0.0089 1.1318 0.2577 Var 301 -0.0569 0.0112 -5.0813 0.0000 Var 302 0.0046 0.0118 0.3864 0.6992 Var 303 -0.0641 0.0101 -6.3203 0.0000 Var 304 -0.1655 0.0100 -16.5072 0.0000 Var 305 -0.1578 0.0133 -11.8268 0.0000 Var 306 -0.3374 0.0125 -26.9624 0.0000 Var 307 -0.4252 0.0127 -33.3584 0.0000 Var 308 -0.4710 0.0132 -35.6199 0.0000 Var 309 -0.5940 0.0187 -31.8095 0.0000 Var 310 -0.5811 0.0305 -19.0644 0.0000 Var 311 -0.6326 0.0505 -12.5367 0.0000 Var 312 -0.4535 0.1202 -3.7729 0.0002 Var 313 0.0838 0.3111 0.2695 0.7876 Var 314 8.4083 2.3818 3.5302 0.0004 Var 315 -3.4641 0.9119 -3.7989 0.0001 Var 316 0.1738 0.0133 13.0266 0.0000 Var 317 13.4024 0.3958 33.8603 0.0000 Var 318 -30.3177 6.3114 -4.8036 0.0000 Var 319 -2.9137 0.4620 -6.3070 0.0000 Var 320 0.5653 0.3122 1.8105 0.0702 . Var 321 0.7009 0.1918 3.6533 0.0003 Var 322 -0.3840 0.0474 -8.0992 0.0000 Var 323 -0.1336 0.0564 -2.3686 0.0179 Var 324 -0.0056 0.0114 -0.4933 0.6218 Var 325 0.1042 0.0119 8.7655 0.0000 ** Var 326 0.0068 0.0120 0.5636 0.5730 Var 327 0.0168 0.0143 1.1751 0.2400 Var 328 -0.1223 0.0142 -8.5883 0.0000 Var 329 -0.1172 0.0227 -5.1699 0.0000 Var 330 -0.2218 0.0212 -10.4490 0.0000 Var 331 -0.1912 0.0198 -9.6448 0.0000 Var 332 -0.1660 0.0149 -11.1360 0.0000 Var 333 -0.0948 0.0121 -7.8114 0.0000 * Var 334 -0.0002 0.0108 -0.0140 0.9888 Var 335 0.0197 0.0108 1.8261 0.0678 . Var 336 -0.0290 0.0111 -2.6091 0.0091 * Var 337 -0.0253 0.0133 -1.8945 0.0582 . Var 338 -0.0620 0.0108 -5.7192 0.0000 Var 339 0.0550 0.0119 4.6291 0.0000 * Var 340 -0.0446 0.0154 -2.9043 0.0037 * Var 341 0.1323 0.0123 10.7968 0.0000 Var 342 0.0748 0.0106 7.0233 0.0000 * Var 343 0.0352 0.0110 3.1843 0.0015 * Var 344 0.1858 0.0108 17.2711 0.0000 Var 345 0.1466 0.0110 13.2841 0.0000 Var 346 0.1405 0.0115 12.1886 0.0000 Var 347 0.0370 0.0118 3.1429 0.0017 Var 348 0.0036 0.0109 0.3342 0.7382 Var 349 0.0574 0.0117 4.9017 0.0000 ** Var 350 0.1727 0.0114 15.1089 0.0000 Var 351 0.1770 0.0125 14.1615 0.0000 Var 352 0.2658 0.0133 20.0379 0.0000 Var 353 0.2439 0.0144 16.9922 0.0000 Var 354 0.5084 0.0201 25.3457 0.0000 Var 355 0.4896 0.0270 18.1030 0.0000 Var 356 0.9615 0.0472 20.3551 0.0000 Var 357 1.0738 0.0980 10.9564 0.0000 Var 358 1.3237 0.1371 9.6567 0.0000 Var 359 0.9284 0.1824 5.0901 0.0000 Var 360 1.4475 0.4339 3.3358 0.0009 Var 361 -1.1372 4.6823 -0.2429 0.8081 Var 362 -0.9246 0.3262 -2.8349 0.0046 Var 363 -0.0125 0.0588 -0.2132 0.8312 Var 364 0.4502 0.0566 7.9586 0.0000 ** Var 365 0.1125 0.0280 4.0167 0.0001 Var 366 1.6832 0.1065 15.7997 0.0000 Var 367 0.9661 0.2034 4.7499 0.0000 Var 368 0.5704 0.0565 10.1036 0.0000 Var 369 0.6114 0.0260 23.4913 0.0000 Var 370 -0.0331 0.0181 -1.8305 0.0672 . Var 371 0.2017 0.0429 4.7063 0.0000 * Var 372 0.0741 0.0238 3.1097 0.0019 * Var 373 -0.2296 0.0254 -9.0563 0.0000 Var 374 -0.1441 0.0151 -9.5502 0.0000 Var 375 -0.0053 0.0310 -0.1710 0.8642 Var 376 0.4055 0.0250 16.2113 0.0000 Var 377 0.0734 0.0259 2.8368 0.0046 Var 378 -0.1381 0.0229 -6.0212 0.0000 ** Var 379 -0.3788 0.0509 -7.4470 0.0000 Var 380 -0.0563 0.0386 -1.4600 0.1443 Var 381 -0.1117 0.0531 -2.1024 0.0355 * Var 382 -0.5015 0.0510 -9.8254 0.0000 * Var 383 -0.2141 0.0810 -2.6420 0.0082 * Var 384 -0.3949 0.0713 -5.5393 0.0000 Var 385 -0.6335 0.1730 -3.6626 0.0002 Var 386 -0.9620 0.1368 -7.0318 0.0000 Var 387 4.9957 1.7360 2.8777 0.0040 Var 388 -85.3715 12.1993 -6.9981 0.0000 ** Var 389 1.8593 0.1117 16.6436 0.0000 Var 390 0.0778 0.1300 0.5983 0.5496 Var 391 0.0025 0.0303 0.0817 0.9349 Var 392 -0.4441 0.0861 -5.1553 0.0000 Var 393 -0.9635 0.1690 -5.7009 0.0000 Var 394 -0.6194 0.5089 -1.2169 0.2236 Var 395 -8.7251 8.3045 -1.0507 0.2934 Var 396 -0.5815 0.0593 -9.7980 0.0000 Var 397 -0.0655 0.0144 -4.5522 0.0000 Var 398 0.0255 0.0260 0.9783 0.3279 Var 399 -0.0811 0.0134 -6.0440 0.0000 Var 400 -0.3023 0.0141 -21.4457 0.0000 Var 401 -0.3581 0.0106 -33.8395 0.0000 Var 402 -0.0780 0.0201 -3.8807 0.0001 Var 403 -0.0613 0.0158 -3.8773 0.0001 Var 404 -0.4015 0.0157 -25.6335 0.0000 Var 405 -0.4220 0.0156 -26.9746 0.0000 Var 406 -0.4725 0.0252 -18.7483 0.0000 Var 407 -0.2034 0.0237 -8.5890 0.0000 Var 408 -0.5261 0.0313 -16.8221 0.0000 Var 409 -0.3602 0.0274 -13.1529 0.0000 Var 410 -0.2947 0.0401 -7.3513 0.0000 Var 411 -0.8911 0.0469 -18.9813 0.0000 Var 412 -0.7269 0.0793 -9.1610 0.0000 Var 413 -0.0939 0.1005 -0.9345 0.3500 Var 414 0.2099 0.2139 0.9809 0.3266 Var 415 -0.8145 0.2826 -2.8826 0.0039 Var 416 25.6728 10.5823 2.4260 0.0153 Var 417 13.1196 116.2027 0.1129 0.9101 Var 418 -0.1497 0.0195 -7.6845 0.0000 Var 419 -0.7910 0.3179 -2.4880 0.0128 * Var 420 -10.8770 0.9840 -11.0544 0.0000 Var 421 0.6308 0.2926 2.1556 0.0311 Var 422 0.5786 0.0730 7.9252 0.0000 ** Var 423 0.1884 0.0376 5.0133 0.0000 Var 424 -0.2013 0.0099 -20.3671 0.0000 Var 425 -0.0178 0.0120 -1.4814 0.1385 Var 426 -0.0623 0.0109 -5.7122 0.0000 Var 427 -0.0248 0.0115 -2.1611 0.0307 * Var 428 -0.0797 0.0139 -5.7186 0.0000 Var 429 -0.0845 0.0190 -4.4405 0.0000 Var 430 -0.1532 0.0310 -4.9463 0.0000 Var 431 -0.2576 0.0124 -20.7293 0.0000 Var 432 -0.2043 0.0113 -18.1560 0.0000 Var 433 -0.0073 0.0091 -0.7986 0.4245 Var 434 -0.0482 0.0087 -5.5126 0.0000 Var 435 0.0243 0.0094 2.5912 0.0096 Var 436 0.0935 0.0093 10.0308 0.0000 ** Var 437 -0.0366 0.0094 -3.8990 0.0001 Var 438 0.1714 0.0099 17.2532 0.0000 Var 439 0.1936 0.0088 21.8863 0.0000 Var 440 0.1085 0.0087 12.4691 0.0000 Var 441 0.1193 0.0090 13.2681 0.0000 Var 442 0.3107 0.0091 34.2000 0.0000 Var 443 0.2844 0.0096 29.6555 0.0000 Var 444 0.3692 0.0098 37.5630 0.0000 Var 445 0.2608 0.0129 20.2577 0.0000 Var 446 0.2734 0.0142 19.2596 0.0000 Var 447 0.4791 0.0127 37.7910 0.0000 Var 448 0.5991 0.0158 37.8357 0.0000 Var 449 0.6478 0.0230 28.1913 0.0000 Var 450 0.9558 0.0329 29.0725 0.0000 Var 451 1.4188 0.0627 22.6439 0.0000 Var 452 1.2997 0.1126 11.5443 0.0000 Var 453 1.0688 0.1939 5.5113 0.0000 Var 454 1.9206 0.2862 6.7116 0.0000 Var 455 -1.4669 2.4001 -0.6112 0.5411 Var 456 0.4735 0.1349 3.5106 0.0004 Var 457 0.5588 0.0172 32.5795 0.0000 Var 458 0.0609 0.0329 1.8522 0.0640 . Var 459 0.3946 0.1189 3.3179 0.0009 Var 460 -0.2073 0.0406 -5.1105 0.0000 Var 461 0.4840 0.0134 36.0418 0.0000 Var 462 0.2124 0.0151 14.0866 0.0000 Var 463 -0.3112 0.0123 -25.2636 0.0000 Var 464 0.2155 0.0204 10.5378 0.0000 Var 465 -0.1596 0.0143 -11.1842 0.0000 Var 466 -0.2368 0.0186 -12.6959 0.0000 Var 467 -0.1618 0.0195 -8.3161 0.0000 Var 468 -0.0550 0.0241 -2.2857 0.0223 * Var 469 -0.1570 0.0237 -6.6252 0.0000 Var 470 -0.4621 0.0360 -12.8335 0.0000 Var 471 -0.3842 0.0385 -9.9672 0.0000 Var 472 -0.4520 0.0552 -8.1942 0.0000 Var 473 -0.1655 0.0517 -3.2006 0.0014 Var 474 -0.3016 0.0460 -6.5618 0.0000 ** Var 475 -0.6393 0.0455 -14.0446 0.0000 Var 476 -0.2174 0.0861 -2.5255 0.0116 * Var 477 -0.4818 0.0867 -5.5571 0.0000 ***

Var 478 -0.5280 0.8545 -0.6180 0.5366

Significance codes: 0 '_**' 0.001 '_' 0.01 '' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9883 on 130350 degrees of freedom Multiple R-squared: 0.5695, Adjusted R-squared: 0.5679 F-statistic: 360.7469 on 478 and 130350 DF, p-value: 0.000

haifengl commented 8 years ago

For the first data you shared with me, the model can be fit in 30 seconds on my laptop. For the new data you just shared, it took about 10 minutes.

lukehutch commented 8 years ago

Your linear regression results look correct, and the summary stats at the bottom are exactly the same as the output printed by R without removing the zero columns. (R correctly detects 478 degrees of freedom across 485 variables, 7 of which are zero for all rows.)

On my machine, this regression takes: (SMILE times below are using Oracle JDK 8_91, not OpenJDK):

40 seconds in R (no need to remove zero columns)
10 seconds in sklearn (no need to remove zero columns)
13 minutes in SMILE OLS (after removing zero columns to avoid rank deficient exception) --10 minutes on your machine
For comparison, SMILE LASSO takes 9 minutes to converge (after removing zero columns to avoid NaN values, doesn't throw an exception)

(PS: I don't think it has been true for quite some time that OpenJDK performs substantially differently than Oracle JDK, in fact I think the codebases have largely converged. To check this though, I ran the same SMILE SingularValueDecomposition on both OpenJDK and Oracle JDK, and the time taken was 44 minutes vs. 42 minutes respectively -- roughly within the expected margin of variation, depending on whatever else the machine was doing at the time.)

So, there are two problems here: (1) SMILE is roughly 78x slower at linear regression than sklearn, on my machine, on Oracle JDK 8. (2) SMILE cannot handle zero columns during either linear regression or lasso, but both sklearn and R can handle zero columns.

Problem (1) could potentially be alleviated by at least multithreading the QR decomposition, or using some other method. Linear regression is such an important staple, as a baseline for machine learning, that this method really needs to be faster!

Problem (2) is particularly bad for rare features: it might be the case that in a given feature vector, for a specific 5-fold cross validation division of the data, all the non-zero entries in the rare feature component end up in the test set, and the training set has all zeroes for that column. It could be the case that SMILE crashes only sometimes, on some random cross-validation divisions, and not on others. This is a serious problem, because rare features are quite commonly present in a model.

For OLS, the exception triggered in problem (2) is caused by the linear regression method (involving QR decomposition) requiring full rank. How can this be fixed to work with deficient rank?

For LASSO, the NaN values triggered in problem (2) is caused by division-by-zero when the variance of a column is zero -- see the following line:

https://github.com/haifengl/smile/blob/129889d482379526d2ff0ecbb344b1dac9d2c11e/core/src/main/java/smile/regression/LASSO.java#L280

Line 275 should be changed to:

            double s = Math.sqrt(scale[j] / n);
            scale[j] = s == 0.0 ? 0.0 : 1.0 / s;    // (or compare Math.abs(s) to some EPSILON)

then line 280 should be changed to:

                X[i][j] *= scale[j];

I didn't check the rest of LASSO to see if this is the only change needed, but it's never good practice to do floating point division without checking for zero first. (I spotted another case of possible unchecked division-by-zero in QRDecomposition.) (It's also faster in general to multiply by the reciprocal than to divide.)

haifengl commented 8 years ago

Both R and sklearn use lapack for linear regression. It is interesting that they show huge performance difference. Maybe they call different lapack routines? BTW, lapack doesn't use multithreading, but they do use SSE. If you can disable SSE on R and sklearn, the performance gap will be much smaller.

For this particular case, we are actually not comparing the performance among R, sklearn, and smile. In fact, we are comparing JVM and lapack/Fortran. No doubt who is the winner.

Because Java and JVM are not optimal platform for numeric computing such as linear regression, I won't expect that we can beat lapack on matrix computation. But I will check if we can find fast algorithms to improve smile. Or we will call lapack through JNI, which I tried to avoid for sake of java's cross-platform.

For second problem, I checked sklearn source code. I din't find they prune zero columns. I guess that lapack takes care of that.

Thanks for checking LASSO code, I will go through it and make improvements as you suggested.

P.S., if your goal is to compare the performance of sklearn and smile, I suggest you try some other algorithms such as svm, logistic regression, decision trees and random forest, etc. These methods are not about matrix computation. So you are not comparing to lapack indeed.

douglasArantes commented 8 years ago

You could use a library like ND4J in Smile Core for matrix computation?

ND4J's Benchmarking

lukehutch commented 8 years ago

Yes, I'm aware of the likely difference in runtime between Fortran code and Java code. However, a factor of 78 difference cannot be explained by a combination of SSE speedup (generally up to 4x), inefficiency in JIT-generated code compared to AOT-compiled code (up to 2x), Java-specific overheads (array bounds checks, reference handling etc., maybe a 20-50% overhead at most, assuming you're not allocating new memory, which you're not). There's still a factor of ~10x difference in performance between sklearn and SMILE, taking all of this into account (and sklearn also has overheads for the Python<->C++ and C++<->Fortran interface boundaries). You're right that the LAPACK code is not multithreaded either (although most numerical methods can be parallelized to some degree). So more likely the difference does not come down to just language/compilation/runtime efficiencies, but difference in numerical methods.

This is easy to test though -- there are several projects to bring BLAS/LAPACK to Java using f2j and other tools, e.g.:

You may find some other fast pure Java solutions here that would work:

http://math.nist.gov/javanumerics/

It would actually be helpful to have some benchmarks published on the SMILE site for the core numerical methods (SVD, QR decomposition, etc.) of SMILE vs. other more common Java alternatives.

If you do eventually want to try vectorizing the most crucial numerical methods, some options are:

JOCL -- a Java wrapper for OpenCL: https://jogamp.org/jocl/www/
Aparapi -- a wrapper for turning data-parallel operations into OpenCL: https://github.com/aparapi/aparapi ; http://developer.amd.com/tools-and-sdks/opencl-zone/aparapi/ (Note that Aparapi is likely going to be part of Java 9 at release.)
RootBeer for GPU programming from Java: https://github.com/pcpratts/rootbeer1

However, I think a large speedup (at least 5-10x) could be obtained directly in Java by improving the numerical methods, or choosing different algorithms altogether.

I will create a separate bug for the rank deficiency issue (#88), since this bug has turned more into a discussion of efficiency.

Edit: It does look like ND4J is the best solution. You get portability across CPUs / GPUs (via BLAS) and clusters (via Spark) if you base all your computationally-intense operations on ND4J.

haifengl commented 8 years ago

@douglasArantes Thanks! ND4j looks very good. It seems baking all kinds of linear algebra backends into it. I am not sure how portable it is for deployment. With it, should I provide multiple builds separately for windows, linux, macOs, cuda, etc.?

The math and matrix computation parts of smile was created back to 2009 (maybe 2008, cannot remember). It is time to use all latest technologies to speed it up. Shall we create a branch 2.0 to use ND4j and keep old 1.x for pure java implantation? Any thoughts? Thanks!

haifengl commented 8 years ago

@lukehutch I agree with you that the algorithms play much bigger impacts on speed. It is why I said it is the comparison to fortran/lapack (fortran for language, lapack for algorithms). QR in smile is a standard algorithm based on Household transformation. Lapack provides multiple algorithms (complete orthogonal factorization, singular value decomposition, or SVD divide and conquer). sklearn uses numpy for least squares, which in turn calls lapack's svd with divide and conquer, which is the best available algorithm today. SVD decomposition doesn't care matrix ranks. This is probably why R and sklearn don't need to remove zero columns.

With ND4j, I see the potentials to leverage highly optimized libraries such as MKL, OpenBlas, and even CUDA. I will check how we can incorporate ND4j into smile.

BTW, what's speed comparison for lasso between sklearn and smile? I do use advanced algorithms for many methods such as lasso, svm, etc. Thanks!

douglasArantes commented 8 years ago

I believe that creating a branch 2.0 that use the ND4J and keep 1.x branch with pure Java will be a great choice.

Regarding the distribution I do not know what would be the best approach, ND4J is used in the DeepLearning4J (DL4J Github), you may want to contact the staff of Skymind through Gitter.

Useful information can be find in dependencies page and native GPUs.

haifengl commented 8 years ago

Thanks @DiegoCatalano ! I will check them out.

haifengl commented 8 years ago

@lukehutch , I tried least squares with SVD (smile's pure java implementation), which handle rank deficient nicely. However, it is actually slower than QR for your data. I will add a control parameter to let people to chose which method in use.

haifengl commented 8 years ago

we can handle rank deficient now with svd. In Java,

OLS(x, y, true)

to use SVD.

In scala,

ols(x, y, "svd").

To use QR, simply OLS(x,y) or OLS(x, y, false) in Java. ols(x,y) or ols(x, y, "qr") in scala.

lukehutch commented 8 years ago

Great, thanks for your work on this. When are you thinking of doing the next release?

lukehutch commented 8 years ago

SMILE LASSO: 9 minutes sklearn Lasso: 38 seconds

However, I don't know if I set the parameters the same way -- SMILE takes lambda (I set it to 1.0), whereas sklearn takes alpha (defaults to 1.0, not sure if it's even the same thing).

haifengl commented 8 years ago

I am working on reducing memory footage for decision trees on large datasets. After that, I will release 1.2. It is 1.2 instead of 1.1 because some interface changes in scala api. I hope to release in one or two weeks.

Then I will work on 2.0 with ND4j.

haifengl commented 8 years ago

@luke, for lasso, do they produce same results? Do then coverage in similar number of iterations? Thanks!

haifengl commented 8 years ago

BTW, your data is extremely sparse. Do you use sparse matrix in sklearn?

lukehutch commented 8 years ago

@haifengl re. shipping different versions for different platforms: What you would do is just depend on ND4J, then the user can depend on the appropriate backend for their architecture, and/or to add GPU or cluster support (ND4J will automatically pick the fastest backend using ServiceLoader and a priority system). See:

http://nd4j.org/backend.html http://nd4j.org/dependencies.html

You can see the difference in backend speeds here: http://nd4j.org/benchmarking

lukehutch commented 8 years ago

I'm not using sparse matrices, unless sklearn detected the matrix is sparse by itself, and sparsified it.

I can't see how to make sklearn print verbose output with the Lasso method, so I don't know how many iterations it is using, but it stopped with an RMSE of 1.5. SMILE stopped after 150 iterations with an RMSE of 0.72. However, SMILE's debug output (on stderr) show the error swinging a lot during covergeance, so presumably the parameters are not set right.

haifengl commented 8 years ago

On my laptop (unplugged, suppose slower), it took 50 seconds to fit your first data, 6 minutes to fit the second/larger data. Are you still using OpenJDK?

The error swinging you felt is not really a swing. The error such as

[main] INFO smile.math.Math - BCG: the error after 149 iterations: 0.008335

is of BCG, which is an inner iteration. This may make you feel the error is swinging.

The real LASSO iteration output looks like

[main] INFO smile.regression.LASSO - LASSO: primal and dual objective function value after 3 iterations: 1.2731e+05 3076.6

Note that 150 iterations reported on the console is actually of BCG. The real LASSO takes only 3 iterations to fit your bigger data.

haifengl commented 8 years ago

To match your sklearn setting, I uses lambda = 1.0 now. On my laptop, our LASSO finishes in 414 seconds with a regular matrix. But if we use sparse matrix, it takes only 50 seconds, which is much closer to sklearn's time on your machine. Moreover, Our LASSO finishes with RMSE 0.98, which is much better than sklearn's 1.5.

haifengl commented 8 years ago

@lukehutch i think the the problem is more with double[][] data structure for matrix. With a one-dimensional column-major matrix, I can fit your data with OLS in 55 seconds on my laptop by QR, which is close to R.

haifengl / smile

QRDecomposition is slow #86

Var 478 -0.5280 0.8545 -0.6180 0.5366