I am running into an issue where calling linearSolve or any other linear solve function causes more than one core to be used. The program is not compiled with -threaded. I have been able to duplicate this issue on multiple machines.
The following is a minimum program that demonstrates the issue.
module Main where
import Numeric.LinearAlgebra
vLength :: Double
vLength = 4096
m1 :: Matrix Double
m1 = fromList [1..vLength] `outer` fromList [1..vLength]
main :: IO ()
main = print $ m1 <\> fromList [1..vLength]
The cabal file used with LTS 10.4 build plan from Stackage.
name: main
version: 0.1.0.0
author: jcmartin
build-type: Simple
cabal-version: >=1.10
executable main
main-is: Main.hs
build-depends: base >=4.9 && <5, hmatrix
ghc-options: -Wall -O2 -rtsopts
default-language: Haskell2010
The effect is most noticeable with a large matrix, but smaller matrices cause unexpected behavior as well. The following was a run on my local computer with vLength set to 2048. Important to note is that the elapsed time is shorter than the total time indicating that more than one thread was run simultaneously.
> stack exec -- main +RTS -s -RTS > /dev/null
98,134,536 bytes allocated in the heap
1,061,792 bytes copied during GC
33,599,568 bytes maximum residency (3 sample(s))
1,056,688 bytes maximum slop
68 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 28 colls, 0 par 0.002s 0.011s 0.0004s 0.0068s
Gen 1 3 colls, 0 par 0.000s 0.009s 0.0031s 0.0092s
INIT time 0.000s ( 0.015s elapsed)
MUT time 35.078s ( 31.411s elapsed)
GC time 0.002s ( 0.020s elapsed)
EXIT time 0.000s ( 0.003s elapsed)
Total time 35.080s ( 31.449s elapsed)
%GC time 0.0% (0.1% elapsed)
Alloc rate 2,797,600 bytes per MUT second
Productivity 100.0% of total user, 99.9% of total elapsed
This behavior is undesirable as when it runs a machine without enough cores, the performance of the overall program is severely hurt. The desired behavior should be that the number of cores used is either configurable (runtime or compile time) or at the minimum fixed to one core.
I have been unable to duplicate this behavior with other libraries or code, so I am led to believe that this is an issue specific to hmatrix.
Perhaps your blas/lapack external libraries automatically use multiple cores. In my machine, with non optimized blas/lapack, your program runs much slower but the elapsed time is almost equal to total time.
I am running into an issue where calling linearSolve or any other linear solve function causes more than one core to be used. The program is not compiled with -threaded. I have been able to duplicate this issue on multiple machines.
The following is a minimum program that demonstrates the issue.
The cabal file used with LTS 10.4 build plan from Stackage.
The effect is most noticeable with a large matrix, but smaller matrices cause unexpected behavior as well. The following was a run on my local computer with
vLength
set to 2048. Important to note is that the elapsed time is shorter than the total time indicating that more than one thread was run simultaneously.This behavior is undesirable as when it runs a machine without enough cores, the performance of the overall program is severely hurt. The desired behavior should be that the number of cores used is either configurable (runtime or compile time) or at the minimum fixed to one core.
I have been unable to duplicate this behavior with other libraries or code, so I am led to believe that this is an issue specific to hmatrix.