haasad / PyPardiso

Python interface to the Intel MKL Pardiso library to solve large sparse linear systems of equations
BSD 3-Clause "New" or "Revised" License
135 stars 20 forks source link

No option for multithread? #11

Closed adtzlr closed 3 years ago

adtzlr commented 5 years ago

Hello,

is it possible to set the max. number of threads through pypardiso.spsolve as an argument? I couldn't find any documentation. I already tried to edit the file scipy_aliases.py to

...

# pypardsio_solver is used for the 'spsolve' and 'factorized' functions. Python crashes on windows if multiple 
# instances of PyPardisoSolver make calls to the Pardiso library
pypardiso_solver = PyPardisoSolver()

# set max. number of threads
pypardiso_solver.set_num_threads(pypardiso_solver.get_max_threads())

...

but inside Windows Task Manager I only get 25% CPU load caused by one process. Any suggestions? Thank you!

haasad commented 3 years ago

Just tested it on a windows machine and got full CPU utilization using all cores. This has always been the case per default. If this is still relevant to you almost two years later, please let me know and we can have a look at your specific setup.

adtzlr commented 3 years ago

Hi, thanks for your reply. I use pypardiso quite often recently in my own FE code (see my repo felupe if you are interested). Pypardiso is considerably faster than the SciPy Sparse linalg solver, so I did not really care if it did use all cores or not. Speed, easy installation and easy interface are the reasons why I like your project.

haasad commented 3 years ago

Hi Andreas,

glad to hear that pypardiso is useful to you. Regarding the easy installation: pypardiso is now also available on PyPi. Could be useful as I saw the the recommended way to install felupe uses pip.

And as stated above, I'd be happy to investigate why pypardiso doesn't use all cores in your case. In theory it should do so automatically.

Cheers, Adrian

adtzlr commented 3 years ago

Regarding the easy installation: pypardiso is now also available on PyPi.

Nice! Will try the PyPi version soon.

And as stated above, I'd be happy to investigate why pypardiso doesn't use all cores in your case. In theory it should do so automatically.

Thanks, I'll do a few tests in the next days and check again CPU loads during linear solve.

adtzlr commented 3 years ago

Okay, I did a test: I installed pypardiso from PyPi (v.0.3.2), modified the example from here (Line 11, ~n=9~ to n=51) and I checked CPU load via Windows Task Manager during linear solve (warning: most of the time is used for finite element assembly and not linear solve). I get a total value of about 50% CPU load during linear solve on my AMD Ryzen 5 2400G, so I think everything is fine. My CPU has 4 cores / 8 threads and probably pypardiso is utilizing 4 threads per default?

haasad commented 3 years ago
In [1]: import mkl

In [2]: mkl.get_max_threads()
Out[2]: 4

Yes, iirc mkl uses the number of physical cores to determine the number of threads.

adtzlr commented 3 years ago

Got it! You have to call mkl.set_dynamics(0) before setting a user defined threads value above the number of real cores (see here). Otherwise mkl resets the user value of threads to the max. number of cores. I modified the appropriate function in pardiso_wrapper, Line 288:

def set_num_threads(self, num_threads):
        """Set the number of threads the solver should use (only a hint, not guaranteed that
        the solver uses this amount)"""
        mkl.set_dynamic(0)
        mkl.set_num_threads(num_threads)

Haven't testet though if it is really faster or not, but I get 100% CPU load now. 💯

adtzlr commented 3 years ago

Got it! You have to call mkl.set_dynamics(0) before setting a user defined threads value above the number of real cores (see here). Otherwise mkl resets the user value of threads to the max. number of cores. I modified the appropriate function in pardiso_wrapper, Line 288:

def set_num_threads(self, num_threads):
        """Set the number of threads the solver should use (only a hint, not guaranteed that
        the solver uses this amount)"""
        mkl.set_dynamic(0)
        mkl.set_num_threads(num_threads)

Haven't testet though if it is really faster or not, but I get 100% CPU load now. 💯

Unfortunately - at least for me - there is now gain in speed. So it seems the 100% CPU load is 50% extra overhead 😊

haasad commented 3 years ago

Yes, looks like pardiso doesn't profit from hyperthreading at all, so using the number of physical core seems like a sound choice. I found some interesting discussions regarding this topic here: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-no-of-threads-vs-no-of-processor/td-p/1072982

And thanks for testing it! Even if it didn't turn out to be successful