scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
13.03k stars 5.17k forks source link

Scipy sparse performance #10812

Closed SanPen closed 1 year ago

SanPen commented 5 years ago

Hi,

I am the developer of GridCal, a program that makes extensive use of the sparse library of scipy.

There is a feature in the program called time-series power flow, which performs a series of calls to a Newton-Raphson based solver to simulate electrical circuits.

The profiling of this feature showed that the scipy function function get_index_dtype was using the 9% of the simulation resources. This was way ahead of the resources taken by the linear system solver or the total amount of sparse matrix multiplications which are a lot.

This arose when testing different linear algebra libraries (Pardiso, KLU, SuperLU, etc...) The differences were negligible in my program, when in standalone benchmarks the libraries do have a noticeable performance difference. So the conclusion is that the bottleneck should be somewhere else.

I think there is room for improvement here.

perimosocordiae commented 5 years ago

Can you share a self-contained script that exhibits this behavior?

SanPen commented 5 years ago

Well, the best I can do is to publish the test script:

from GridCal.Engine import *

def test():
    fname = os.path.join('..', '..', 'Grids_and_profiles', 'grids', 'IEEE39.gridcal')
    print('Reading...')
    main_circuit = FileOpen(fname).open()
    options = PowerFlowOptions(SolverType.NR, verbose=False,
                               initialize_with_existing_solution=False,
                               multi_core=False, dispatch_storage=True,
                               control_q=ReactivePowerControlMode.NoControl,
                               control_p=True)

    ############################################################
    # Time Series
    ############################################################
    print('Running TS...', '')
    start = time.time()

    ts = TimeSeries(grid=main_circuit, options=options)
    ts.run()

    end = time.time()
    dt = end - start
    print('  total', dt, 's')

if __name__ == '__main__':
    import cProfile
    cProfile.runctx('test()', None, locals())

You'll need to install GridCal pip install GridCal, and then you'll need the test file.

SanPen commented 5 years ago

I seems to me that it could be beneficial to have a safety override.

Once I know that the dimensions of my matrices are correct I can safely disable the dimensionality checks and alike.

SanPen commented 5 years ago

Hi,

I don't know if this is being reviewed, but I'll leave here the performance test from scipy. Scipy_performance_test.csv.txt

These are the top routines called by the process, sorted by "own time":

Name Call count Time(ms) Own time(ms)
check_format 1034808 26615 9758
get_index_dtype 3312174 20398 8845
method 'reduce' of 'numpy.ufunc' objects 2618590 7158 7158
built-in method numpy.array 16754393 6999 6963
_check 519565 18529 6461
init 6624370 6256 6256
init 1034808 60367 4293
prune 1080000 7514 3425
check_shape 1667363 5202 3346
bmat 67767 23876 2840
tocoo 112963 10123 2489