Scipy sparse performance

SanPen commented 5 years ago

Hi,

I am the developer of GridCal, a program that makes extensive use of the sparse library of scipy.

There is a feature in the program called time-series power flow, which performs a series of calls to a Newton-Raphson based solver to simulate electrical circuits.

The profiling of this feature showed that the scipy function function get_index_dtype was using the 9% of the simulation resources. This was way ahead of the resources taken by the linear system solver or the total amount of sparse matrix multiplications which are a lot.

This arose when testing different linear algebra libraries (Pardiso, KLU, SuperLU, etc...) The differences were negligible in my program, when in standalone benchmarks the libraries do have a noticeable performance difference. So the conclusion is that the bottleneck should be somewhere else.

I think there is room for improvement here.

perimosocordiae commented 5 years ago

Can you share a self-contained script that exhibits this behavior?

SanPen commented 5 years ago

Well, the best I can do is to publish the test script:

from GridCal.Engine import *

def test():
    fname = os.path.join('..', '..', 'Grids_and_profiles', 'grids', 'IEEE39.gridcal')
    print('Reading...')
    main_circuit = FileOpen(fname).open()
    options = PowerFlowOptions(SolverType.NR, verbose=False,
                               initialize_with_existing_solution=False,
                               multi_core=False, dispatch_storage=True,
                               control_q=ReactivePowerControlMode.NoControl,
                               control_p=True)

    ############################################################
    # Time Series
    ############################################################
    print('Running TS...', '')
    start = time.time()

    ts = TimeSeries(grid=main_circuit, options=options)
    ts.run()

    end = time.time()
    dt = end - start
    print('  total', dt, 's')

if __name__ == '__main__':
    import cProfile
    cProfile.runctx('test()', None, locals())

You'll need to install GridCal pip install GridCal, and then you'll need the test file.

SanPen commented 5 years ago

I seems to me that it could be beneficial to have a safety override.

Once I know that the dimensions of my matrices are correct I can safely disable the dimensionality checks and alike.

SanPen commented 5 years ago

Hi,

I don't know if this is being reviewed, but I'll leave here the performance test from scipy. Scipy_performance_test.csv.txt

These are the top routines called by the process, sorted by "own time":

Name	Call count	Time(ms)	Own time(ms)
check_format	1034808	26615	9758
get_index_dtype	3312174	20398	8845
method 'reduce' of 'numpy.ufunc' objects	2618590	7158	7158
built-in method numpy.array	16754393	6999	6963
_check	519565	18529	6461
init	6624370	6256	6256
init	1034808	60367	4293
prune	1080000	7514	3425
check_shape	1667363	5202	3346
bmat	67767	23876	2840
tocoo	112963	10123	2489

scipy / scipy

Scipy sparse performance #10812