Closed heckflosse closed 2 years ago
Hi, thanks for such ideas... that's very welcome. I did a quick add https://github.com/ilia3101/MLV-App/commit/9c31863850f305988c70af61281bd867b43f3307
But unfortunately it isn't so easy to use your lib as is... maybe I should create a C wrapper for it, because our processing code is mainly in C. But I really like the interface!
What about the result of ca_correct - does it look different than earlier versions? In our project I don't use it very much, because the result looks mostly worse than without using it.
Yes, RCD seems to be very close to AMaZE quality wise. But here RCD is a little slower. But my computer isn't the fastest anyway and probably not made for benchmarking such algorithms ;)
Anyway. Thanks again for the idea!
What about the result of ca_correct - does it look different than earlier versions?
For 99% of the cases two iterations are better than one. There are very rare cases where up to 5 iterations are needed and also very rare cases where two iterations are not better than one.
But here RCD is a little slower
That's strange. Here RCD is about twice as fast than amaze
Edit: btw: I also detected, that the librtprocess rcd code is slower than the RawTherapee rcd code (though I have no idea why this is the case). Maybe try the rcd code from RT...
I can also provide comparisons between RT amaze code and rcd code concerning performance (if needed)
Edit; the rcd code needs to be compiled using -O3 to vectorize...
maybe I should create a C wrapper for it, because our processing code is mainly in C. But I really like the interface!
Feel free to create a PR for librtpocess with a C wrapper :+1:
Feel free to create a PR for librtpocess with a C wrapper 👍
I tried to start that, but I don't get librtprocess compiled on my system. Maybe macOS Mavericks is too old. But it is the currently minimum supported by MLVApp. The compiler don't knows < algorithm > and other includes...
I tried to start that, but I don't get librtprocess compiled on my system. Maybe macOS Mavericks is too old. But it is the currently minimum supported by MLVApp. The compiler don't knows < algorithm > and other includes...
You could also try the rt version
The compiler don't knows < algorithm > and other includes...
Update your requirements ;-)
And, don't give up. Especially for video, demosaic speed is important in my opinion, as there a lot of frames have to be demosaiced. I will help on this :+1:
Yes, that's absolutely right. Speed is so important! Feel so stupid to ask... but what is required to make the compiler happy in terms of those C++ libs? Does it need C++20? Then I am lost on this old platform 😄
Found it... "sleefsseavx.c" must have a cpp ending... then the compiler is happy. 😄 ... progress...
Oh yeah baby I am feeling this one....
Sent from my iPhone Xr
On Mar 29, 2021, at 11:51 AM, masc4ii @.***> wrote:
Found it... "sleefsseavx.c" must have a cpp ending... then the compiler is happy. 😄 ... progress...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hmm, I wonder why you need sleefsseavx.c for rcd demosaic ...
btw: the darktable implementation is plain C
https://github.com/darktable-org/darktable/blob/master/src/iop/rcd_demosaic.c
I yesterday did my own changings on top of your librtprocess rcd version. This works nicely in MLVApp now.
Now - next thing I would like to try: using librtprocess instead of all our own code. For compiling the entire MLVApp in one step automatically, I would like to integrate librtprocess as is plus wrapper into MLVApp. And the librtprocess code doesn't compile on macOS with llvm-clang compiler, until you rename your included sleefsseavx.c to .cpp. With that strategy I succeeded last night to use librtprocess lmmse, amaze, rcd and igv. Amaze looks a little strange: there are many vertical (mostly pink) artifacts in the picture - but not in all clips I tested. No such problem with our Amaze version. The other debayers work as expected so far.
I pushed a change to librtprocess which should fix your compilation issue
https://github.com/CarVac/librtprocess/commit/838f25c4a16afd0ead69e496615c960f7c115c22
Amaze looks a little strange: there are many vertical (mostly pink) artifacts in the picture - but not in all clips I tested
can you show a screenshot?
About the speed difference between Amaze and RCD
I measured a 45.6 MP file on my old AMD FX8350
Amaze: ~550 ms RCD: ~260 ms
That was using gcc 10.2 on Windows/msys2
I know that using clang RCD is slower. Maybe we can find the reason with your help ;-)
I can do a screenshot later.
On Win10, mingw64, i7-4770, 1856x1044 I get in MLVApp (most features disabled):
On macOS 10.14, llvm clang, MBPR 2013 13", i5, same clip, same settings:
On macOS 10.15, llvm clang, MBPR 2013 15", i7, same clip, same settings:
Suer, I don't just measure the time for debayering. But the entire processing is identical. I just switched the debayer.
Hmm, strange... was rcd compiled using -O3 in your Win10/mingw test?
Edit: for good speed of rcd demosaic (at least when using gcc) you need to use -O3 or -ftree-vectorize
As an alternative you can also use (that's what darktable does)
#ifdef __GNUC__
#pragma GCC push_options
#pragma GCC optimize ("fast-math", "fp-contract=fast")
#endif
at the beginning of the rcd source and
// revert rcd specific aggressive optimizing
#ifdef __GNUC__
#pragma GCC pop_options
#endif
at the end
Here comes the screenshot of the faulty AMaZE (zoom in on and around the white boat, there are vertical artifacts, which are not there normally):
And this is the difference to our AMaZE algorithm (I added some contrast to make it more visible):
Btw: maybe some parameters are wrong and cause this. What should I do with initGain, inputScale and outputScale? I set this all to 1.0 for testing.
I tried all what you wrote (-O3/-ftree-vectorize/the pragmas) on your/my changed RCD code only for now. It makes no difference at all. Maybe I did something wrong when adapting it to our C and MLVApp concept and it is broken for being accelerated. CPU is at 35% only when using RCD. For AMaZE it is up to 85%. Both % values for the Windows version. On macOS it is always at 100%. --> Maybe I can test the entire librtprocess in the next days. We'll see if it works there.
CPU is at 35% only when using RCD
That sounds really wrong. What is the chunkSize value you used?
I set the chunkSize to the number of threads I can drive. But also here: if I set it to 1 or 2 or 8 doesn't change anything.
I set the chunkSize to the number of threads I can drive.
Usually the best value for chunkSize is 2. For really large raw files a value up to 6 can be better. The chunkSize is just a cache thing. I will explain it later.
Anyway, as you tried out several values, it must be something else...
which gcc version are you using on Windows?
which gcc version are you using on Windows?
I think it is Mingw64 7.3.0.
ping @jenshannoschwalm
Maybe he has an idea. We collaborated to improve rdc speed...
I think it is Mingw64 7.3.0.
what does gcc --version
spit out?
It is: gcc (x86_64-posix-seh-rev0, Built by MinGW-W64-project) 7.3.0
Oh nice... new benchmark: On macOS 10.14, llvm clang, MBPR 2013 13", i5, same clip, same settings as above:
Old MLVApp from yesterday, our old AMaZE and the adapted RCD:
MLVApp with librtprocess and C wrapper, so both librtprocess' AMaZE and RCD:
It is: gcc (x86_64-posix-seh-rev0, Built by MinGW-W64-project) 7.3.0
Why such an old gcc version? msys2 is on 10.2 already
It is the standard which comes with Qt. We compile the entire App with the help of Qt.
For Windows builds we (RawTherapee) always rely on what pacman -Syuu
provides us. Currently it's gcc 10.2
mmm,
for(int row = 4; row < height - 4; row++)
{
#if defined(__clang__)
#pragma clang loop vectorize(assume_safety)
#elif defined(__GNUC__)
#pragma GCC ivdep
#endif
for(int col = 4; col < width - 4; col++)
{
const int i = row * width + col;
.......
The AMaZE problem is influenced by the tilesize: If I set ts=992, all artifacts are gone (and it is slow). ts=96 produces even more artifacts 😄
In RawTherapee we use ts = 160 for years without problems
Correct. But I just can tell you what I see. Our old AMaZE algorithm work just fine; same for other algorithms from your library. Who knows why...
In AHD I also get some very small artifacts which I don't get with our old AHD. It mostly looks identical, but look the blue and yellow dots on the water (in our old AHD implementation this is just smooth, same for AMaZE, RCD and others):
On more question: what must I do to use the auto feature in CA_correct? And what are those fitParams? All I can get to work (at least I see any difference) is the cared and cablue setting - but it is very hard to make CAs invisible with that. A cyan edge will always look a little cyan.
Uncorrected:
Corrected:
A different CA postprocessing algorithm:
The file renaming fix you did: it works fine now. Many thanks!
what must I do to use the auto feature in CA_correct?
set autoCA
to true and autoIterations
to a value > 0, preferrably 2
And what are those fitParams?
You don't need to set them. They will be passed back from the auto ca correction in case you want to use the same values for several frames (for example all frames of a pixel shift raw). Just set fitParamsIn
to false for non pixelshift files.
set
autoCA
to true andautoIterations
to a value > 0, preferrably 2
Okay... that also is what I already thought and tried. Unfortunately all edges get extremely green.
Measure gives this output: CA correcting 1488x1900 image with 2 tiles per thread numblox = 0 CA correction took 136 ms
In the end, this should be the call for testing (anamorphic picture, unsquezzed in post):
CA_correct( 0, 0, 1488, 1900, true, 2, 0.0, 0.0, false, rawData, rawData, cfa, sPC, fitParams, false, 65535.0f, 65535.0f, 2, true );
cfa = {{0,1},{1,2}}
Is it a problem, that in=out? I just saw the comment: "for CA_correct rawDataIn and rawDataOut may point to the same buffer."
back to the demosaicers for a moment. May be it's not the point but the artefacts you observe are related to the tiling borders.
While we did the rcd pull request it became obvious that other code had such problems that i missed first. The problem was fast-math working fine with rcd but not with amaze.
I dont think that the tilesize shuld matter at all. It's just about performance and cache fitting. BUT fast-math makes assumptions which are at least not met for amaze.
If someone else like to play with the current state... here a commit for testing: https://github.com/ilia3101/MLV-App/commit/2604ccae55205ca3d816249edc4154eab538748f
Description of what works so far can be found in the commit message.
Hi, I stumbled upon MLV-App by a PR @SimonSegerblomRex made for RawTherapee.
Now I wonder, why you don't use RCD demosaic. It's on par with Amaze quality wise while being much faster.
It's available in RawTherapee, darktable (dev), siril, Filmulator and ART.
The easiest way to use it would be to use https://github.com/CarVac/librtprocess which also includes the ca_correct code which allows more than one iteration.
Just an idea ;-)
Ingo