haskell-numerics / hmatrix

Linear algebra and numerical computation
381 stars 104 forks source link

(Profiled) Code spends most of its time in `Internal.Devel.check` #317

Open fhaust opened 4 years ago

fhaust commented 4 years ago

I just profiled some hmatrix heavy code and found that it spends 70% of its time in Internal.Devel.check which sounds strange to me. From what I gathered that function is used to check the return code of the FPU?

To be fair, this might be an non issue when the code is running not profiled due to optimizations kicking in?

idontgetoutmuch commented 4 years ago

I have no idea - if you could post an example, I could investigate a bit further.

StefanHubner commented 4 years ago

I just found the same. I attached the profile file. Indeed, this could just be an FFI call being attributed to that function by the profiler.

profile.zip

EDIT: forked the repository and tried it without the error check function which sped up the particular part of my code about 10 fold (from 44sec to 4sec).

profile.nocheck.zip

EDIT 2: Turns out, as suspected, being lazy, this never evaluated the FFI call.

I suppose this implies that the issue can be closed.

idontgetoutmuch commented 4 years ago

@StefanHubner Thanks for investigating - just to be clear: this speed up only happens when profiling? So the problem seems to be one with profiling not with hmatrix itself?

StefanHubner commented 4 years ago

@idontgetoutmuch No, the speed-up came from it never being evaluated. The error code checking is not costly but the computational expense of the FFI call is attributed to the check function. I hope this makes sense!

dschrempf commented 4 years ago

I am having the same issue. It seems that the FFI call is evaluated when the error value is checked, and so, the computation happens in Internal.Deve.check. Is it possible to change this behavior to get more informative profiling results?

EDIT: The actual computation is a Matrix-Vector multiplication and a dot product alla (vTranspose <# matrix) <.> v.

HuwCampbell commented 4 years ago

I think we could very well inline calls to check and (#|), their definitions are quite small, and it would probably push the cost centre to the c calls. There's also the option of explicit cost centre annotations.