haasad / PyPardiso

Python interface to the Intel MKL Pardiso library to solve large sparse linear systems of equations
BSD 3-Clause "New" or "Revised" License
133 stars 20 forks source link

Check equality of sparse matrices, to see if updating the factorization is necessary #1

Closed haasad closed 7 years ago

haasad commented 8 years ago

At the moment a simple identity comparison is applied to see if matrix A is already factorized: if self.factorized_A is A: self.set_phase(33) else: self.set_phase(13)

While this is convenient and really fast, it breaks when matrix A is modified (e.g. A[0,0] += 1), because self.factorized_A is A will return True and the factorization won't be updated.

I can think of two ways to overcome this:

I prefer the second one, but I have to check how much it affects the speed of the different methods.

cmutel commented 8 years ago

The same issues applies when factorizing a matrix using any other solver. The returned value from the factorize call will assume that the A matrix doesn't change.

On some level, this is an inherent problem with always doing factorization. If this was optional, then it would be the users fault if they used a "stale" factorization. But factorization is so fast and convenient with Pardiso that it would be nice to keep it.

By the way, more or less any modifications can trigger this behaviour, so this (could) be a real issue:

In [1]: from scipy.sparse import *

In [2]: m = lil_matrix((2, 2))

In [3]: id(m)
Out[3]: 4375863928

In [4]: m[:, 0] = 1

In [5]: id(m)
Out[5]: 4375863928

In my opinion, the ideal solution would be to not store the entire matrix, but some piece of information that includes all matrix data. I haven't found anything like a fast way to hash sparse matrices, but maybe there is something clever around.

For CSR matrices, you could just store the .data array, and 1) compare the shape; 2) sum the absolute value of the difference from old to new. This should handle the case where values are switched but the matrix sum stays the same.

cmutel commented 8 years ago

See also bw2calc issue.

haasad commented 8 years ago

This behaviour in my opinion makes sense for the factorized method and there the user should be responsible. The real problem is, that this behaviour also applies to spsolve, where the user should expect that no "stale" factorization is applied when he modified matrix A.

haasad commented 8 years ago

For CSR matrices, you could just store the .data array, and 1) compare the shape; 2) sum the absolute value of the difference from old to new. This should handle the case where values are switched but the matrix sum stays the same.

I thought about just comparing A.data, but this doesn't catch stuff like A.transpose() and therefore A.indices and A.indptr would also need to be checked. I'll try to find the fastest method to do this.

Edit: It would catch A.transpose() because the order changes, you're right. But it wouldn't catch the case, where one value is moved from one row to the next, because A.data stays the same.

cmutel commented 8 years ago

Inspired by smart people:

How about this:

import hashlib

def hash_csr_matrix(matrix):
    return (
        str(matrix.shape) +
        hashlib.sha1(matrix.indices).hexdigest() + 
        hashlib.sha1(matrix.indptr).hexdigest() + 
        hashlib.sha1(matrix.data).hexdigest()
    )

Takes ~2 milliseconds for ecoinvent 3.2 (cutoff), which seems reasonable.

haasad commented 8 years ago

I came up with this:

def csr_matrix_equal(a1, a2):
    return (np.array_equal(a1.indptr, a2.indptr) and 
            np.array_equal(a1.indices, a2.indices) and 
            np.array_equal(a1.data, a2.data))

In my tests this is faster than hashing the matrices, even when the initial copying of one matrix is included. In case of a1 != a2, comparison time is even lower because only the first statement (with the shortest array) needs to be evaluated.

cmutel commented 8 years ago

Sure, comparing the values is ~10 times faster than hashing them. But you have to keep a copy of the matrix, so you tradeoff memory usage for speed.

For example, ecoinvent 3.2 (cutoff) is 1.4 MB of RAM:

So probably this tradeoff is worth it; solving a linear system with Pardiso when it is already factorized is on the order of a few milliseconds, so hashing would double the time. Plus you get guaranteed Py2/3 compatibility. Sigh.

haasad commented 8 years ago

True, I haven't really considered the memory usage. I have yet to test wether very large matrices run into RAM-limitations or take prohibitively long to solve. Maximum that i tested so far is ~300Mb, which takes around 3 min.

cmutel commented 8 years ago

By the way, this is slightly faster:

def csr_matrix_equal2(a1, a2):
    return all((np.array_equal(a1.indptr, a2.indptr),
                np.array_equal(a1.indices, a2.indices),
                np.array_equal(a1.data, a2.data)))
cmutel commented 8 years ago

As I was trying to go to sleep last night, I was worried about the Monte Carlo classes, which do rebuild the matrices after doing an initial calculation using the normal solver. Luckily, these classes all rebuild the technosphere matrix, so the existing identity check should be enough to avoid the use of a stale cache in future calculations:

In [6]: mc = MonteCarloLCA({db.random(): 1})

In [7]: next(mc)
Out[7]:
array([ -3.84992617e-05,  -4.49016549e-09,  -4.14491504e-06, ...,
        -2.54873438e-02,  -2.40185083e-04,  -1.32010176e+00])

In [8]: id(mc.technosphere_matrix)
Out[8]: 4502266096

In [9]: next(mc)
Out[9]:
array([ -3.64070020e-05,  -3.45972601e-09,  -3.77893232e-06, ...,
        -2.23343439e-02,  -6.50755958e-04,  -1.19602921e+00])

In [10]: id(mc.technosphere_matrix)
Out[10]: 4503150776

Both the MonteCarloLCA and ParameterVectorLCA use rebuild_technosphere_matrix, which creates a new CSR matrix object each time.

cmutel commented 8 years ago

Maybe a good balance between compute speed and memory usage would be to store a copy of the matrix up until a certain size, but then switch to checking the hash values, with an advanced option to skip this check entirely.

haasad commented 7 years ago

The rework of the wrapper in cfaa4c58ba276b8f5290ba2999d43537b9f7dde6 now includes the hashing and the comparison functions discussed above. The generic threshold for keeping a copy in memory is atm 5e7 non-zero entries, which is roughly 600 Mb. This threshold can be set to zero to always use hashing or very high to always store a copy. I'm thinking about adding a method to the wrapper that skips all checks in order to get maximum efficiency, but I'm not sure yet. Manually setting the phase to 33 and using the private _call_pardiso method is already possible, but leads to segfaults when not called properly.

I'll close this issue as soon as I have written tests for this new functionality.

haasad commented 7 years ago

The hashing and the comparison functions are now part of v0.2.0. I'm closing this for now.