haasad / PyPardiso

Python interface to the Intel MKL Pardiso library to solve large sparse linear systems of equations
BSD 3-Clause "New" or "Revised" License
136 stars 20 forks source link

Factorization clarification question #75

Closed marc-vdm closed 5 months ago

marc-vdm commented 5 months ago

I'm writing some custom BW code (can share over email, but is still to remain private for now) where I observe a ~60% speed increase (2600 200 solve/sec vs 1600 123 solve/sec) when I factorize beforehand.

Are you certain the below is correct? https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L73-L74

You may be correct that the computational/memory cost is not worth it if you only perform a few -whatever number few is- Ax=b solves, on the same A, but when I do many solves (>20k), this does seem to matter quite a lot.


edit: Also note that def factorize() itself has conflicting information with the above. https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/pardiso_wrapper.py#L145-L152

cmutel commented 5 months ago

I am 99% sure that the difference your are seeing is due to https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L44. If the matrix wasn't factorized the difference would be at least an order of magnitude slower. What this line does is make sure the A matrix, which is factorized, is the same now as when it was factorized in the first place. factorize doesn't make this check and is therefore somewhat faster.

cmutel commented 5 months ago

The ideal option, in my opinion, would be to be able to call spsolve without the A matrix check, probably through an additional input argument. This would still preserve compatibility with scipy input arguments, and avoid paying the memory cost of keeping another copy of the A matrix around: https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L71-L77

haasad commented 5 months ago

You should be able to do the following for maximum speed:

import pypardiso

solver = pypardiso.ps
solver.factorize(A)
solver.set_phase(33)
x = solver._call_pardiso(A, b)

This skips all the checks etc.

marc-vdm commented 5 months ago

Thanks for your replies both! I'll experiment with Adrians solution by rewriting def solve_linear_system() and def decompose_technosphere() and see how that (doesn't?) change results

I'll report back with some findings

haasad commented 5 months ago

Please be aware that factorized is in a way just a convenience function, so pypardiso can be used as drop-in replacement for scipy.sparse.linalg.factorized. It still uses pypardiso.spsolve under the hood and pypardiso.spsolve always does a factorization for re-use by subsequent calls to spsolve.

haasad commented 5 months ago

I think I found the reason for the the performance difference that you see. I assume the brightway technosphere matrix is in csc format?

factorized converts A to csr format and keeps a copy of it: https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L77

spsolve converts to csr on every call: https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L41-L42

Unfortunately my commit message from 2016 is pretty useless: https://github.com/haasad/PyPardiso/commit/7a60c843a1caa2f67a61629ffeeb9b6de47dedb3

pardiso can deal with both csc and csr formats: https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/pardiso_wrapper.py#L219-L227

haasad commented 5 months ago

Looks like I actually wrote down the reasoning for this in #7, even with a jupyter notebook and everything :blush:

marc-vdm commented 5 months ago

Alright, here's some findings:

@cmutel I am 99% sure that the difference your are seeing is due to

https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L44

I'm not entirely sure this is correct. I think it may be the conversion to csr that is costing so much time. https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L41-L42 When I convert my BW matrices to csr beforehand, I get the same speed as calling lca.decompose_technosphere() (which does this for us). solver._check_A does, however, re-do the checks for csc/csr, which doesn't seem needed when calling spsolve as we already convert to csr anyway.

I think if Brightway would set it's default format to csr instead of csc, we can speed up these calculations by default. The current BW25 implementation (which blocks the factorization in lca.decompose_technosphere()) seems to be incurring a speed penalty for no reason unless user thinks to convert lca.technosphere_matrix to csr themselves. This confirms what Adrian is saying in this comment. I understand the original csc format for UMFPACK compatibility, but most users are on x86 and thus use pypardiso, so perhaps this default is not the best.

@haasad You should be able to do the following for maximum speed:

import pypardiso

solver = pypardiso.ps
solver.factorize(A)
solver.set_phase(33)
x = solver._call_pardiso(A, b)

This adds about ~10% speed for me, while nice if you know what you're doing, perhaps it's safe that this is indeed not done for BW.

Now to look ahead: What makes sense for BW? I would like to see lca.decompose_technosphere() not be blocked when using pypardiso -especially with a wrong warning type- but perhaps I'm misunderstanding something here? Furthermore, does it make sense to set the sparse format based on the solver we're checking the solver anyway, so setting the 'correct' sparse format could help speed things up.

Let me know if you'd like to see a PR for BW25 Chris, I'll try and so it soon-ish then.

haasad commented 5 months ago

I would like to see lca.decompose_technosphere() not be blocked when using pypardiso -especially with a wrong warning type- but perhaps I'm misunderstanding something here?

Furthermore, does it make sense to set the sparse format based on the solver we're checking the solver anyway, so setting the 'correct' sparse format could help speed things up.

I also think that the best way would be if brightway uses csr format as default in combination with pypardiso.

This adds about ~10% speed for me, while nice if you know what you're doing, perhaps it's safe that this is indeed not done for BW.

I don't think the trade-off of 10% speed versus skipping all safety checks is worth it. If you do an LCA calculation, modify some values in the technosphere matrix and then do another LCA calculation, you might get completely wrong results without the checks in pypardiso.

marc-vdm commented 5 months ago

pypardiso always does a factorization + solve, the warning in bw2calc is correct

Well, partially IMO. By giving this warning and putting the factorization in an else:, BW users with pypardiso (which is still most of them) will keep using a csc format technosphere, as previously lca.decompose_technosphere() would convert this to csr. In BW25, a user would now need to manually convert their technosphere, or face a ~60% speed loss as the technosphere needs to be converted to csr for every new FU.

We can either automatically keep technosphere (and for consistency also biosphere?) in the correct format for the solver (csc for scipy, csr for pypardiso) -which is the best solution IMO-, or, we allow lca.decompose_technosphere() to call factorized(), which would convert the technosphere to the correct csr format.

I'll close this discussion now, I think we can continue in https://github.com/brightway-lca/brightway2-calc/issues/98 as this is a BW issue, not a pypardiso issue.