Closed marc-vdm closed 5 months ago
I am 99% sure that the difference your are seeing is due to https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L44. If the matrix wasn't factorized the difference would be at least an order of magnitude slower. What this line does is make sure the A
matrix, which is factorized, is the same now as when it was factorized in the first place. factorize
doesn't make this check and is therefore somewhat faster.
The ideal option, in my opinion, would be to be able to call spsolve
without the A
matrix check, probably through an additional input argument. This would still preserve compatibility with scipy input arguments, and avoid paying the memory cost of keeping another copy of the A
matrix around:
https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L71-L77
You should be able to do the following for maximum speed:
import pypardiso
solver = pypardiso.ps
solver.factorize(A)
solver.set_phase(33)
x = solver._call_pardiso(A, b)
This skips all the checks etc.
Thanks for your replies both!
I'll experiment with Adrians solution by rewriting def solve_linear_system()
and def decompose_technosphere()
and see how that (doesn't?) change results
I'll report back with some findings
Please be aware that factorized
is in a way just a convenience function, so pypardiso can be used as drop-in replacement for scipy.sparse.linalg.factorized
. It still uses pypardiso.spsolve
under the hood and pypardiso.spsolve
always does a factorization for re-use by subsequent calls to spsolve.
I think I found the reason for the the performance difference that you see. I assume the brightway technosphere matrix is in csc format?
factorized
converts A to csr format and keeps a copy of it:
https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L77
spsolve
converts to csr on every call:
https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L41-L42
Unfortunately my commit message from 2016 is pretty useless: https://github.com/haasad/PyPardiso/commit/7a60c843a1caa2f67a61629ffeeb9b6de47dedb3
pardiso can deal with both csc
and csr
formats:
https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/pardiso_wrapper.py#L219-L227
Looks like I actually wrote down the reasoning for this in #7, even with a jupyter notebook and everything :blush:
Alright, here's some findings:
@cmutel I am 99% sure that the difference your are seeing is due to
I'm not entirely sure this is correct. I think it may be the conversion to csr
that is costing so much time.
https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L41-L42
When I convert my BW matrices to csr
beforehand, I get the same speed as calling lca.decompose_technosphere()
(which does this for us). solver._check_A
does, however, re-do the checks for csc
/csr
, which doesn't seem needed when calling spsolve
as we already convert to csr
anyway.
I think if Brightway would set it's default format to csr
instead of csc
, we can speed up these calculations by default. The current BW25 implementation (which blocks the factorization in lca.decompose_technosphere()
) seems to be incurring a speed penalty for no reason unless user thinks to convert lca.technosphere_matrix
to csr
themselves. This confirms what Adrian is saying in this comment. I understand the original csc
format for UMFPACK
compatibility, but most users are on x86 and thus use pypardiso
, so perhaps this default is not the best.
@haasad You should be able to do the following for maximum speed:
import pypardiso solver = pypardiso.ps solver.factorize(A) solver.set_phase(33) x = solver._call_pardiso(A, b)
This adds about ~10% speed for me, while nice if you know what you're doing, perhaps it's safe that this is indeed not done for BW.
Now to look ahead:
What makes sense for BW?
I would like to see lca.decompose_technosphere()
not be blocked when using pypardiso
-especially with a wrong warning type- but perhaps I'm misunderstanding something here?
Furthermore, does it make sense to set the sparse format based on the solver we're checking the solver anyway, so setting the 'correct' sparse format could help speed things up.
Let me know if you'd like to see a PR for BW25 Chris, I'll try and so it soon-ish then.
I would like to see lca.decompose_technosphere() not be blocked when using pypardiso -especially with a wrong warning type- but perhaps I'm misunderstanding something here?
A = A.tocsr()
Furthermore, does it make sense to set the sparse format based on the solver we're checking the solver anyway, so setting the 'correct' sparse format could help speed things up.
I also think that the best way would be if brightway uses csr format as default in combination with pypardiso.
This adds about ~10% speed for me, while nice if you know what you're doing, perhaps it's safe that this is indeed not done for BW.
I don't think the trade-off of 10% speed versus skipping all safety checks is worth it. If you do an LCA calculation, modify some values in the technosphere matrix and then do another LCA calculation, you might get completely wrong results without the checks in pypardiso.
pypardiso always does a factorization + solve, the warning in bw2calc is correct
Well, partially IMO. By giving this warning and putting the factorization in an else:
, BW users with pypardiso (which is still most of them) will keep using a csc
format technosphere, as previously lca.decompose_technosphere()
would convert this to csr
. In BW25, a user would now need to manually convert their technosphere, or face a ~60% speed loss as the technosphere needs to be converted to csr
for every new FU.
We can either automatically keep technosphere (and for consistency also biosphere?) in the correct format for the solver (csc
for scipy, csr
for pypardiso) -which is the best solution IMO-, or, we allow lca.decompose_technosphere()
to call factorized()
, which would convert the technosphere to the correct csr
format.
I'll close this discussion now, I think we can continue in https://github.com/brightway-lca/brightway2-calc/issues/98 as this is a BW issue, not a pypardiso issue.
I'm writing some custom BW code (can share over email, but is still to remain private for now) where I observe a ~60% speed increase (
2600200 solve/sec vs1600123 solve/sec) when I factorize beforehand.Are you certain the below is correct? https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/scipy_aliases.py#L73-L74
You may be correct that the computational/memory cost is not worth it if you only perform a few -whatever number few is-
Ax=b
solves, on the sameA
, but when I do many solves (>20k), this does seem to matter quite a lot.edit: Also note that
def factorize()
itself has conflicting information with the above. https://github.com/haasad/PyPardiso/blob/0f7afd061e68f8ef3e7f77c4aeb3f3b5070bb006/pypardiso/pardiso_wrapper.py#L145-L152