mupen64plus / mupen64plus-video-rice

Video plugin for the Mupen64Plus v2.0 project, using OpenGL. This plugin is based on the RiceVideoLinux plugin from earlier versions of Mupen64Plus.
31 stars 40 forks source link

Reorganize pixel format logic in OGLTexture #29

Open Narann opened 9 years ago

Narann commented 9 years ago

OGLTexture need some love to fix some consistency problems in the way pixel formats are choosen.

The current OGLTexture rely on an option quality to choose to store texture pixels as 16bits (GL_RGBA4) vs 32bits (GL_RGBA).

I wonder if it's still relevant from a performance perspective to have such option but it maybe make sense on embedded systems with (very) limited memory and bandwidth.

Also, setting 16bits color quality in Rice option also try to set color buffer size to 16 which always fail on OGL (and fall back to 24).

In the future, We could rely on runtime definition depending on available extensions. It seems to be a good choice.

I just found the GLES extension to check fro BGRA support: OpenGL ES Extension #51 GL_EXT_texture_format_BGRA8888

Note this extensions is supported by 98% of hardware according to OpenGL ES Hardware Database.

After some investigation, I will leave the color quality option 16 vs 32 and choose the pixel format constant at runtime also depending on the chosen quality.

Narann commented 9 years ago

Should be fixed by https://github.com/Narann/mupen64plus-video-rice/commit/e212e714f5ff3a34d9dfb96e26978e050ab6ac62 and https://github.com/Narann/mupen64plus-video-rice/commit/aeb36c411178dca64bc43dd71ca6343ed6ece86c in my repo. Waiting for feedbacks.

littleguy77 commented 9 years ago

I am a bit wary about relying on extensions in critical code, when standard gles flags can be used. I have not looked at your refactoring branch in awhile, but if it doesn't add too much bloat to keep the RGBA format in the gles codepath, then I may still prefer that. As it stands now, mupen64plus-ae is using a completely vanilla rice implementation from upstream, and it is working fine.

I was mostly curious about going the other direction. How intrusive it would be to change the GL codepath to use RGBA instead of BGRA? Would there be any noticeable performance impact?

Narann commented 9 years ago

I am a bit wary about relying on extensions in critical code, when standard gles flags can be used.

No they can't without change the fragment shader (I will explain later) :(

I'm also surprised OGLES 2 specs doesn't offer BGRA... Maybe it's not relevant anymore...

And: I'm also in favor of "standard specs" instead extensions of to keep the code clean and portable (and are more likely to run on crappy drivers). I spend soooo much time to remove deprecated extension with branch everywhere I would try to avoid them.

As it stands now, mupen64plus-ae is using a completely vanilla rice implementation from upstream, and it's working fine.

I guess by vanilla you are not talking about my refactoring right? If so, look at this.

The reason why Rice vanilla can send textures in GL_RGBA in OGLES it's because it revert it in the fragment shader. So it just pushed the problem elsewhere (and from my perspective, it's worst). The code state I send it to RGBA (which is not true) and I revert it in the shader.

Rice vanilla (aka not refactored) doesn't use the same GLSL code generator for OGLES and desktop OpenGL, something fixed in my refactoring: Both use the same GLSL code generator.

I was mostly curious about going the other direction. How intrusive it would be to change the GL codepath to use RGBA instead of BGRA? Would there be any noticeable performance impact?

I'd already investigate this. It should not be too difficult but it's (old) low level code so I'm not sure of what could potentially break. N64 was unable to deal with BGRA internally so I guess Original author manually put textures in BGRA because it had important performance impact. So, technically put texture conversion code back to RGBA would be intrusive but would also remove some code and make the emulation smoother. But Rice is an emulator and there is a lot of situation where the texture is uploaded for each frame! This would bring to big performance issues but I have no mean to bench this...

I will ask on both OpenGL and Dolphin forum how they deal with this, what they choose and why.

If we have no clean answer, I would choose the solution that "decrease branching and code size over performance", aka: RGBA everywhere! :)

Narann commented 9 years ago

I had a very interesting answer on the OpenGL forum.

Basically, RGBA seems to be "the way to go" with ARM as ARM chips are big endian while x86 archs are little endian. So, if we push for RGBA textures the impacted arch would become x86 but I would say x86 arch is, in general, faster than ARM ones, compensating the trade of.

What do you think?

littleguy77 commented 9 years ago

I am a bit wary about relying on extensions in critical code, when standard gles flags can be used.

No they can't without change the fragment shader (I will explain later) :(

I should point out that I am not at all an expert on the video plugin implementation. I know just enough OpenGL (from the 1.x days) to make me dangerous. So I trust your judgement.

I guess by vanilla you are not talking about my refactoring right? If so, look at this.

Correct, and thanks for the concrete example.

I will ask on both OpenGL and Dolphin forum how they deal with this, what they choose and why.

Great idea.

Basically, RGBA seems to be "the way to go" with ARM as ARM chips are big endian while x86 archs are little endian. So, if we push for RGBA textures the impacted arch would become x86 but I would say x86 arch is, in general, faster than ARM ones, compensating the trade of.

That was my hunch, that x86 (mostly desktop platforms) could absorb the performance penalty, to give us leaner code. But I don't have the expertise to say for certain, that's why I posed the question to you and other devs.

Narann commented 9 years ago

That was my hunch, that x86 (mostly desktop platforms) could absorb the performance penalty, to give us leaner code. But I don't have the expertise to say for certain, that's why I posed the question to you and other devs.

For emulators, I tent to prefer code focusing on why "things are emulated like this" over performance than code doing both at the same time and are basically unreadable anymore.

So, the plan is:

Narann commented 9 years ago

I think I get it:

In the code, each retrieved color is converted to a uint32 A8R8G8B8 (So no BGRA here). But as we are on x86 (little-endian), datas are send from right to left: In B8R8G8A8. This explain why Desktop OpenGL setted in GL_BRGA had always had good colors.

As OGLES doesn't have GL_BGRA by default, I first setted it to what OGLES spec provide: GL_RGBA. So technically, running the OGLES code path on little-endian (x86) arch would revert red and blue and give bad color.

Gillou68310 confirm this as he was seeing reverted red and blue on the OGLES code path. As he was using PowerVR SDK on a x86 (little endian) system, this make sense.

What was surprising is that it seems to have the same problem on it's Note3. But after some check, I guess he compile my code using default GCC ARM option which are.... Little endian!

More I did into the question, more I realize ARM tent to use little endian by default (this would contradict what Alfonse Reinheart said).

This mean I would need to actually do a conversion on the Rice code. After some investigation, I think it's bad: N64's R4300 CPU was a big endian cpu. This mean he was dealing internally with Big endian bytes and you would keep the emulation part easier if you don't put little endian cruft in the middle on the code.

After some digging, I realize mupen64plus-core support big endian arch using M64P_BIG_ENDIAN and actually have functions to deal with this but I can't use them as they are part of the core! T_T (Arf! I hate this plugin system...)

Basically, I will have to write the exact same function in the Rice code...

littleguy77 commented 9 years ago

Before going too far down this path, perhaps a second confirmation is wise. Which specific commit(s) should I test in mupen64plus-ae? Between this and bsmiles32's work I'm starting to lose track ;)