Optimizing hdrmerge speed

jcelaya / hdrmerge

HDR exposure merging

http://jcelaya.github.io/hdrmerge/

Other

363 stars 78 forks source link

Optimizing hdrmerge speed #67

Closed heckflosse closed 8 years ago

heckflosse commented 9 years ago

Hi Javier, I use hdrmerge sometimes and though it's quite fast I think it could be even faster with some optimizations to current code. Am I welcome to suggest some optimizations during next weeks (you know, I optimize all and everything in RawTherapee, I guess)? I would post my suggestions as pull requests. Would that be ok for you?

Ingo

jcelaya commented 9 years ago

Hello Ingo I haven't been involved with hdrmerge for quite some time, mainly because I need to dedicate my time to other things. But you are more than welcome to send any pull request and I will review it ASAP.

Thanks for your interest.

Javi

2015-09-02 1:57 GMT+02:00 heckflosse notifications@github.com:

Hi Javier, I use hdrmerge sometimes and though it's quite fast I think it could be even faster with some optimizations to current code. Am I welcome to suggest some optimizations during next weeks (you know, I optimize all and everything in RawTherapee, I guess)? I would post my suggestions as pull requests. Would that be ok for you?

Ingo

— Reply to this email directly or view it on GitHub https://github.com/jcelaya/hdrmerge/issues/67.

heckflosse commented 9 years ago

Hi Javier,

I suggest a first optimization without making a pull request.

https://github.com/jcelaya/hdrmerge/blob/master/BoxBlur.cpp#L74

replace schedule(dynamic) with schedule(dynamic,16) or remove the schedule statement to get (default) static scheduling for this loop.

Explanation: schedule(dynamic) is the same as using schedule(dynamic,1). This is almost the worst case for parallelizing outer loops running at the width of the image (inner loop running at the height). You get a lot of cache conflicts. By using a different scheduling (as mentioned above) you avoid this conflicts. I tested with a merge of two Nikon D700 files. Complete Boxblur processing time went down from about 300 ms to about 150 ms at my Linux box using 4 cores.

Ingo

heckflosse commented 9 years ago

Created https://github.com/jcelaya/hdrmerge/pull/68

heckflosse commented 9 years ago

Additional to the speedups in https://github.com/jcelaya/hdrmerge/pull/68 further speedups should be possible:

1.) IIRC the gaussian blur used in RT is faster than the emulation (three step boxblur) in hdrmerge (by about factor 2). I could try to use the RT ode instead. 2.) The generation of preview (even for half size) could be outperformed by using RT's fast demosaic algorithm (also by about factor 2 for half size and by a lot more for full size)

But I didn't try this yet (only thoughts to be verified).

heckflosse commented 9 years ago

There's also an omp speedup possible for Image::computeResponseFunction Tough only a small speedup...

heckflosse commented 9 years ago

Just for information:

Heres's an overview about my speedups: http://pastebin.com/ag3EhJvv

Ingo

heckflosse commented 8 years ago

Done with 6b79263