Closed heckflosse closed 8 years ago
Hello Ingo I haven't been involved with hdrmerge for quite some time, mainly because I need to dedicate my time to other things. But you are more than welcome to send any pull request and I will review it ASAP.
Thanks for your interest.
Javi
2015-09-02 1:57 GMT+02:00 heckflosse notifications@github.com:
Hi Javier, I use hdrmerge sometimes and though it's quite fast I think it could be even faster with some optimizations to current code. Am I welcome to suggest some optimizations during next weeks (you know, I optimize all and everything in RawTherapee, I guess)? I would post my suggestions as pull requests. Would that be ok for you?
Ingo
— Reply to this email directly or view it on GitHub https://github.com/jcelaya/hdrmerge/issues/67.
Hi Javier,
I suggest a first optimization without making a pull request.
https://github.com/jcelaya/hdrmerge/blob/master/BoxBlur.cpp#L74
replace schedule(dynamic) with schedule(dynamic,16) or remove the schedule statement to get (default) static scheduling for this loop.
Explanation: schedule(dynamic) is the same as using schedule(dynamic,1). This is almost the worst case for parallelizing outer loops running at the width of the image (inner loop running at the height). You get a lot of cache conflicts. By using a different scheduling (as mentioned above) you avoid this conflicts. I tested with a merge of two Nikon D700 files. Complete Boxblur processing time went down from about 300 ms to about 150 ms at my Linux box using 4 cores.
Ingo
Additional to the speedups in https://github.com/jcelaya/hdrmerge/pull/68 further speedups should be possible:
1.) IIRC the gaussian blur used in RT is faster than the emulation (three step boxblur) in hdrmerge (by about factor 2). I could try to use the RT ode instead. 2.) The generation of preview (even for half size) could be outperformed by using RT's fast demosaic algorithm (also by about factor 2 for half size and by a lot more for full size)
But I didn't try this yet (only thoughts to be verified).
There's also an omp speedup possible for Image::computeResponseFunction Tough only a small speedup...
Done with 6b79263
Hi Javier, I use hdrmerge sometimes and though it's quite fast I think it could be even faster with some optimizations to current code. Am I welcome to suggest some optimizations during next weeks (you know, I optimize all and everything in RawTherapee, I guess)? I would post my suggestions as pull requests. Would that be ok for you?
Ingo