method='LU' memory use in spreg.ML_Error()?

rsbivand commented 3 years ago

I was trying to fit an ML error model with 71' observations, but memory use grew very quickly. This may be because some kind of parallelization comes into play (it shouldn't), but it feels more like the weights matrix going dense. The messages before I killed the process were:

> py_mlerror <- spr$ML_Error(y, X, w=nb_q0, method="LU")
/usr/lib64/python3.9/site-packages/scipy/optimize/_minimize.py:779: RuntimeWarning: Method 'bounded' does not support relative tolerance in x; defaulting to absolute tolerance.
  warn("Method 'bounded' does not support relative tolerance in x; "
/usr/lib64/python3.9/site-packages/scipy/sparse/_index.py:125: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  self._set_arrayXarray(i, j, x)
/usr/lib64/python3.9/site-packages/scipy/sparse/linalg/dsolve/linsolve.py:318: SparseEfficiencyWarning: splu requires CSC matrix format
  warn('splu requires CSC matrix format', SparseEfficiencyWarning)
/usr/lib64/python3.9/site-packages/scipy/sparse/linalg/dsolve/linsolve.py:215: SparseEfficiencyWarning: spsolve is more efficient when sparse b is in the CSC matrix format
  warn('spsolve is more efficient when sparse b '

I think the sparse weights matrix is CSR not CSC. Is the problem in the densifying of the variance covariance matrix? About line https://github.com/pysal/spreg/blob/c6d97c1b31bb4997a7c65b3f6b35a1ce3e11282a/spreg/ml_error.py#L244 ? Could a finite difference Hessian help? Does spinv() go dense on return?

ljwolf commented 3 years ago

Does this happen with other backends? If not, it may be specific to the scipy.sparse.SuperLU solver.

Could you post the gal? I'm happy to generate fake data to correspond to it.

ljwolf commented 3 years ago

Does spinv() go dense on return?

It should remain sparse, since the input remains sparse at line 192, but I can instrument this.

pedrovma commented 3 years ago

Does spinv() go dense on return?

It should remain sparse, since the input remains sparse at line 192, but I can instrument this.

If the input is a sparse object, spinv() will return a sparse object, as can be seen on sputils.py, lines 233-237.

    if SP.issparse(a):
        ai = SPla.inv(a)
    else:
        ai = la.inv(a)
    return ai

However, depending on the matrix, its inverse will be dense, albeit still a sparse object.

rsbivand commented 3 years ago

Yes, (I - \lambda W)^{-1}$ is dense by definition, so spinv() returns a sparse matrix with all elements non-zero. @ljwolf - here is the compressed GAL file. large_nb.zip

I think that for method='LU', the variance covariance matrix for the betas is all that can be returned if N is large. An LR test of OLS against ML tests for the inclusion of \lambda.

ljwolf commented 3 years ago

Does spinv() go dense on return?

It should remain sparse, since the input remains sparse at line 192, but I can instrument this.

If the input is a sparse object, spinv() will return a sparse object, as can be seen on sputils.py, lines 233-237.
    if SP.issparse(a):
        ai = SPla.inv(a)
    else:
        ai = la.inv(a)
    return ai
However, depending on the matrix, its inverse will be dense, albeit still a sparse object.

Ahh, I see.... If that's the case, then, @rsbivand's suggestion is very reasonable, I think. We should avoid the rest of the baseclass (e.g, avoid taking the spinv(ai)) for folks who request method="LU".

My hope for implementing LU was that it should avoid instantiating a dense inverse, both during the optimization step and elsewhere. The ord and full methods immediately instantiate w.full().

We'd need to do this as well for ml_lag.py, which is also densified in all methods except LU.

ljwolf commented 3 years ago

Yes, (I - \lambda W)^{-1}$ is dense by definition, so spinv() returns a sparse matrix with all elements non-zero. @ljwolf - here is the compressed GAL file. large_nb.zip

I think that for method='LU', the variance covariance matrix for the betas is all that can be returned if N is large. An LR test of OLS against ML tests for the inclusion of \lambda.

Thank you for the file! Will use for the test case for this.

rsbivand commented 3 years ago

The way around this for large N may be an FD Hessian if one (that works) is available in a Python package you already use. Maybe from a numerical optimizer? Unlike the line search, it starts from the \lambda and \beta we have, and searches close to them to estimate the joint distribution shape. Here we only need \lamba (and maybe \sigma^2) because there are no interactions with the var-covar of the \betas.

While you are looking at that bit of ML_Error, maybe consider adding the Pace-Lesage Hausman test at least for 'full' and 'ord'. For 'LU', it is possible to approximate by powering \lambda W x (if powers guarantee a dampened series, no explosions please!) for an x vector or matrix, as the powered W never gets stored, only cumulated in the target vector (up to an arbitrary cutoff). Unfortunately, as it tests for differences between OLS and *_error \betas, it often finds mis-specification.

pysal / spreg

method='LU' memory use in spreg.ML_Error()? #82