Open purplemarshmallow opened 8 years ago
Great news :smile: ! I tried looking at z64 yesterday and realized it's going to take a lot of work to fix problems though. You're definitely right that other parts of the plugin rely on incorrect results.
For example the tcdiv_persp code is wrong. Whoever wrote it, came up with an interesting algorithm, but it's too imprecise. When I tried putting a more accurate algorithm, it made things look worse. So that's why I'm hesitant to try and fix z64/z64gl. It requires too much work. For the triangle code, the scaling doesn't even make sense imo. It's not accurate enough.
But since you're interested in doing this, I am willing to help.
Thanks any help would be great.
The reason why I'm interested is because I got very good results with little effort. Using angrylion's TMEM loading functions fixed many bugs. Vigilante8 ingame is now faster than with angrylion's plugin. I managed to see something from Superman64 a game that always completely refused to work with z64gl. SD Hiryuu runs fullspeed on my system no more slowdown.
Where is the tcdiv_persp code code in z64gl? Is it named differently? I can't find it.
If you add an accuracy improvement and it makes things look worse you have to look are there also improvements. And if there are more improvenents than regressions it should be added.
Sorry, I'm using the terminology from angrylion's code. https://github.com/purplemarshmallow/z64/blob/master/src/rdp-mess.cpp#L1851
On one hand, improving accuracy is bound to slow down the plugin, but there are plenty of potential optimizations to add in. I guess I can continue trying to study both z64 and angrylion's code more.
You can't say this in general improving accuracy can also speed things up. It depends on what you are doing. Having correct TMEM emulation and correct texture loading won't slow things down for example.
I was talking about for this specific case. z64 is faster than angrylion's, mostly because it does not have the same precision as angrylion's. It's basically cutting a lot of corners. But yes, some accuracy improvements will not slow down emulation.
I'm not planning to improve the z64 software plugin. I only use it as reference to see where code in z64gl comes from and how it's supposed to work in software. Then I look if it can be replaced with current code from angrylion's plugin or from MAME.
I think you misunderstood. I don't want to have common code between z64 and z64gl. I want to have common code between angrylion's plugin and z64gl. I don't think it's worth to work on the z64 software plugin. If you improve it you will end up with angrylion's plugin.
I brought up z64's source because isn't that what z64gl is based on? Most of what I said should still apply to z64gl. To get more accurate results, in some cases will require many more calculations to be done, instead of using quick but imprecise algorithms.
I believe the scaling code is inaccurate in z64gl.
I just felt that it's easier to track down bugs by experimenting with z64, since I do not know OpenGL too well. For example, I want to find out why shields look wrong in Super Smash Bros, and I have a better chance of figuring out the problem by comparing z64 to angrylion's.
I don't think it's worth to work on the z64 software plugin. If you improve it you will end up with angrylion's plugin.
You're right.
To get more accurate results, in some cases will require many more calculations to be done, instead of using quick but imprecise algorithms.
I'd prefer the accurate algorithm unless it's extraordinary slow
I published my branch but it's still very WIP https://github.com/purplemarshmallow/z64/tree/angrylion-integration
F-Zero X fully working now.
I'd prefer the accurate algorithm unless it's extraordinary slow
Agreed.
F-Zero X fully working now.
Legit! Nice work. Can you check out Super Smash Bros? I'd really like to fix that game :smile: . I'm seeing weird changes with it in Super Smash Bros, but could be my drivers maybe. Although the original z64gl was bugged too for me, because the textures are messed up.
It's a regression. MarkTMEMarea still needs to be corrected in my branch to work with the new code. I hope this bug will disappear once it's done
But ingame it looks good for me. I hope this bug won't be too hard to fix...
In the intro, does it look like this to you?
It is really weird how using mesa 3d fixes the shield issue. Yet it seems that z64 also has the same problem. I hope I can track it down. Here's what shields look like on my end.
I tested z64gl on different GPUs. On AMD and Nvidia the shield is correct. On intel HD 3000 I'm getting the same shield issue as seen in your screenshot. On intel HD 5500 it looks correct.
Good to know you can at least reproduce the issue on one machine. Ever since I confirmed that it happens in z64 as well, I've become extra curious about the issue. It's probably the same bug inherited from z64, but I wonder why different gpu's would make a difference in this case.
Interestingly, that tc_div function i mentioned earlier is actually slower than Angrylion's, I believe. I tried making a more optimized version of the hacky division algorithm, just out of curiosity and it was actually slower than Angrylion's more accurate algorithm. I guess using division is really slow.
I looked into the regression in Super Smash bros. Seems not so easy to fix. All 32 bit tiles are broken. I think a good way to fix it is to find out if angrylion's plugin does some special data manipulatins with 32 bit tiles. the z64 software plugin does not.
This line seems to be wrong. Removing it makes things look a bit better but it's still broken https://github.com/purplemarshmallow/z64/blob/angrylion-integration/src/rgl_tiles.cpp#L135
Interestingly your angrylion integration branch seems to fix the random bugs I'd get in Last Legion UX.
What ramdom bugs do you get in Last Legion UX?
This bug doesn't seem to happen with your angrylion integration branch.
If I had better performance with z64gl and it supported gamma, I'd totally use this plugin for Last Legion. I may try looking into these intel specific issues. I'm still puzzled by the fact that some of these intel-only bugs are also present in the z64 software plugin. I really should examine z64's code more, because fixing it on that should lead to also fixing the issue in z64gl.
Do the texture bugs in Last Legion go away if you switch window/fullscreen? https://github.com/purplemarshmallow/z64/commit/320b50d752a06d0a0bac117594bae50f109c4c13 fixed this kind of bug for me. Or is it a different problem?
Do the texture bugs in Last Legion go away if you switch window/fullscreen?
Well, the color seems to have changed back to normal, but the characters were invisible after switching to fullscreen. Then switching back to window gave me a black screen (this is probably due to my Intel IGP though). Maybe I'll try testing on different hardware next week.
320b50d fixed this kind of bug for me. Or is it a different problem?
I just tried compiling with linker optimizations turned on and the bug still never happened. Which games were affected by these compiler settings?
The problem is link time code generation and whole program optimization. If I enable them I get weird bugs in every game. Other linker optimizations can cause crashes. Not sure why.
If I enable link time code generation and whole program optimization Last Legion UX looks like this for me. Looks like in your screenshot. Disabling these options solves the problem for me.
I really should examine z64's code more, because fixing it on that should lead to also fixing the issue in z64gl.
MAME's software renderer starts with the same codebase. Here's the changelog http://git.redump.net/mame/log/src/mame/video/n64.c Maybe it's worth to port that to zilmar spec. Might run faster than angrylion's plugin.
Great idea! i keep forgetting about MAME's code base. Analyzing old commits should really help.
I'm convinced that these intel-only issues are due to missing certain extensions. It has to be using a different code path than what other machines are using. Is there a good way to check which extensions I have?
Also, does your fork compile on linux? I've been trying to set up emulation on linux so that I can test z64gl with a better driver. I'm curious to see if the performance will be better too.
I'm convinced that these intel-only issues are due to missing certain extensions. It has to be using a different code path than what other machines are using.
What are intel-only issues? I know there are texture mirroring problems in SSB and some problems with switching window/fullscreen. Is there anything else? Does Last Legion UX work now with master? You can test my build https://github.com/purplemarshmallow/z64/releases/tag/2
I can't see any GPU specific code paths in z64gl. If there are GPU specific problems it's very likely because there is a bug in this specific driver or the plugin has undefined behavior. If there is undefined behavior every driver can react differently and it can be fixed on z64gl's part. If the problem is a bug in the driver only workarounds can help.
Is there a good way to check which extensions I have?
I don't think you are missing OpenGL extensions. But there are OpenGL extension viewers you can check it. But I'm not sure which program is good.
Also, does your fork compile on linux? I've been trying to set up emulation on linux so that I can test z64gl with a better driver. I'm curious to see if the performance will be better too.
It should build on Linux and there is a Linux makefile in the repo but I never tried. Most likely the Linux makefile needs to be changed a bit. You can also try testing the fork of the Mupen64plus team.
What are intel-only issues? I know there are texture mirroring problems in SSB and some problems with switching window/fullscreen. Is there anything else?
Some menus in SD Hiryuu are incredibly slow, to the point where I get better performance when I use Mesa, even though Mesa is generally much slower (on Windows). I'm also thinking that some of the other performance problems I have are related. Certain people write it off as me "running a toaster", but I've done some testing. I had a friend test various plugins and I found that the gap in performance with z64gl was much bigger than the gap in performance (between my computer and his) with Rice Video D3D9. I can see why some say z64gl is fast, because when it works right, it generally is fast. Something about the way Rice coded his OpenGL and D3D plugin seems to just work properly on my computer. When I start working on a hardware rendering plugin, I will definitely examine Rice's code and try to figure out why that works so well.
Does Last Legion UX work now with master?
I'm assuming it is, because I beat the 1st three levels without seeing those weird bugs. Then when I switched back to original z64gl, I saw the bug half way through the 1st level.
I can't see any GPU specific code paths in z64gl. If there are GPU specific problems it's very likely because there is a bug in this specific driver or the plugin has undefined behavior. If there is undefined behavior every driver can react differently and it can be fixed on z64gl's part. If the problem is a bug in the driver only workarounds can help.
I'm convinced that the plugin must have undefined behavior then, because some of these issues I encountered are identical to what I saw with the z64 software plugin.
It should build on Linux and there is a Linux makefile in the repo but I never tried. Most likely the Linux makefile needs to be changed a bit. You can also try testing the fork of the Mupen64plus team.
I guess I'll try looking for the binary then. I'm pretty sure z64gl didn't come with the m64p package, because I don't seem to have it after installing m64p.
So I decided to see if I'm missing support for any possible functions used in z64gl, and was intrigued by the results
GLEW Extension Info
---------------------------
GLEW version 1.9.0
Reporting capabilities of pixelformat 3
Running on a Intel(R) HD Graphics from Intel
OpenGL version 2.1.0 - Build 8.15.10.2993 is supported
GL_VERSION_3_0: OK
---------------
glBeginConditionalRender: OK
glBeginTransformFeedback: OK
glBindFragDataLocation: OK
glClampColor: OK
glClearBufferfi: OK
glClearBufferfv: OK
glClearBufferiv: OK
glClearBufferuiv: OK
glColorMaski: OK
glDisablei: OK
glEnablei: OK
glEndConditionalRender: OK
glEndTransformFeedback: OK
glGetBooleani_v: OK
glGetFragDataLocation: OK
glGetStringi: OK
glGetTexParameterIiv: OK
glGetTexParameterIuiv: OK
glGetTransformFeedbackVarying: OK
glGetUniformuiv: OK
glGetVertexAttribIiv: OK
glGetVertexAttribIuiv: OK
glIsEnabledi: OK
glTexParameterIiv: OK
glTexParameterIuiv: OK
glTransformFeedbackVaryings: OK
glUniform1ui: OK
glUniform1uiv: OK
glUniform2ui: OK
glUniform2uiv: OK
glUniform3ui: OK
glUniform3uiv: OK
glUniform4ui: OK
glUniform4uiv: OK
glVertexAttribI1i: OK
glVertexAttribI1iv: OK
glVertexAttribI1ui: OK
glVertexAttribI1uiv: OK
glVertexAttribI2i: OK
glVertexAttribI2iv: OK
glVertexAttribI2ui: OK
glVertexAttribI2uiv: OK
glVertexAttribI3i: OK
glVertexAttribI3iv: OK
glVertexAttribI3ui: OK
glVertexAttribI3uiv: OK
glVertexAttribI4bv: OK
glVertexAttribI4i: OK
glVertexAttribI4iv: OK
glVertexAttribI4sv: OK
glVertexAttribI4ubv: OK
glVertexAttribI4ui: OK
glVertexAttribI4uiv: OK
glVertexAttribI4usv: OK
glVertexAttribIPointer: OK
GL_VERSION_3_1: OK
---------------
glDrawArraysInstanced: OK
glDrawElementsInstanced: OK
glPrimitiveRestartIndex: OK
glTexBuffer: OK
GL_VERSION_3_2: MISSING
---------------
glFramebufferTexture: OK
glGetBufferParameteri64v: MISSING
glGetInteger64i_v: MISSING
GL_VERSION_3_3: OK
---------------
glVertexAttribDivisor: OK
I wonder why it says OpenGL version 2.1.0. Now I'm even more confused.. I really need to start testing on linux. Hopefully I can figure out how to fix some of these issues I'm having, without having to use Mesa.
I wonder why it says OpenGL version 2.1.0.
Because that's your version. You're looking at an extensions test.
So I decided to see if I'm missing support for any possible functions used in z64gl,
First almost every one of the commands in your paste are never even used by z64gl.
Second every GL command ever used by z64gl that isn't internal to Windows native software driver is already queried on RomOpen by my new extension-loading code I replaced GLEW with. You should check the console window for output to see the number of used functions that failed to load...if it was greater than 0 you'd probably just have a guaranteed crash anyway.
You should check the console window for output to see the number of used functions that failed to load...if it was greater than 0 you'd probably just have a guaranteed crash anyway.
I see. I checked and it's 0. I guess I'll have to find out some way to debug these weird issues.
Well, I profiled SD Hiryuu's 1st menu and it's saying that a good portion of the time is spent in glBegin in RglRenderChunks. Is that normal?
Not really, but upon this subject I'll make a note-to-self that glBegin() is deprecated and should be replaced with vertex arrays--preferably server-side vertex arrays, but client-side will still work in OpenGL 1.1 if the installable client driver accelerator on top of Microsoft's opengl32.dll is disabled for debugging.
I think to run z64gl you don't even need OpenGL 2.0. The conf file says it works on Geforce 5900. That's over 10 years old and these PC can't run LLE at fullspeed anyway. Would be better to use more modern OpenGL if possible
Not really, but upon this subject I'll make a note-to-self that glBegin() is deprecated and should be replaced with vertex arrays
I see. I'll definitely have to look into this, if I want to get z64gl running full speed on more games.
Would be better to use more modern OpenGL if possible
Yes, even glN64 uses glDrawArrays. Interestingly, Rice's OGL code uses glBegin, yet it's fast in many games. I'll have to examine Rice's code more.
I wouldn't really call glDrawArrays() "modern" OpenGL though.
It's been part of the specs since GL 1.1 from 1997, pretty old. The reason it hasn't been deprecated is that it's a finalized command to process the queued vertex arrays--which are still deprecated if they are done client-side.
@purplemarshmallow I can bump the calls up to modern OpenGL, but in doing so I will prefer to have a compatibility fallback to people either with pre-2.0 video cards or running z64gl on Windows in GDI software mode for debugging video driver issues and glitches. (The core Microsoft version of OpenGL is 1.1, so if you do things like enable "Use 256 colors" in exe compatibility settings, that's what you'll get, and z64gl cannot work unless it is designed to fall back to GL 1.1 functions. (Granted this driver is painfully slow, and video card driver debugging and nostalgia aside there is little reason why someone would be content to test it.))
Interestingly, Rice's OGL code uses glBegin, yet it's fast in many games. I'll have to examine Rice's code more.
glBegin() is not really slow, in spite of being deprecated. In fact, for small command lists it is possible that it's even faster than some vertex arrays. Just depends how you do your GL.
glBegin() is not really slow, in spite of being deprecated. In fact, for small command lists it is possible that it's even faster than some vertex arrays.
Interesting. Good to know.
I can bump the calls up to modern OpenGL
I'm curious to see how much this change would impact performance :smile: .
glBegin() can be as fast--faster even--than vertex arrays on some implementations/video card drivers etc. if the vertex array is very small--like a single (x, y, z) for drawing a single point/pixel and that's it.
Client-side vertex arrays use up CPU time arranging pointers to system memory in the form of C arrays, which need to be streamed to video memory every time glDrawArrays() is called.
Server-side vertex arrays use up the CPU time by uploading C system memory only once to video memory for long-term storage, having that done before glDrawArrays() is ever called. This is still overkill for a "vertex array" which really is only a handful or less of vertices.
Immediate-mode rendering, using the fixed-function pipeline glBegin() and glEnd(), caches nothing at all and in the long term--but not always--is slower. It's deprecated mostly out of philosophy.
I can bump the calls up to modern OpenGL
Would be very nice to get rid of deprecated OpenGL. I have AMD Nvidia and intel GPUs to test would be interesting to see if it affects performance.
I will prefer to have a compatibility fallback to people either with pre-2.0 video cards or running z64gl on Windows in GDI software mode for debugging video driver issues and glitches.
I see no need to support anything before OpenGL 2.0. PCs with these GPUs can't run it at fullspeed anyway and I don't think it can ever work in GDI software mode because z64gl uses shaders
So that's why I'm hesitant to try and fix z64/z64gl. It requires too much work. For the triangle code, the scaling doesn't even make sense imo. It's not accurate enough.
Would be better to rewrite, while using some of z64's concepts, such as chunk rendering. Replacing all edgewalker rendering, combiners,and the texture/tile/framebuffer cache would be nice and help with memory usage, especially if you use a hashtable library such as uthash. No doubt you could also do packing of tiles into one massive texture too.
Would be better to rewrite, while using some of z64's concepts, such as chunk rendering.
If i were to work on a video plugin, it will definitely be from scratch. However, before I am able to do that, I will have to experiment with different plugins. I'm very interested in figuring out why Rice's Video plugin is so fast. I also need to figure out a good way to profile API code, so that I can learn how to optimize it.
I actually like z64gl's framebuffer code. I get better performance in Mario Kart with z64gl than Glide64 while using fb emulation for the monitor. I'll have to see how he did that.
Didnt't framebuffer notification was used for that?
Didnt't framebuffer notification was used for that?
Yes. The difference between z64gl and Glide64 is that it uses glReadPixels to copy the video memory over to RAM. That's what makes it faster than Glide64's method.
This is currently WIP I want to upload it when I finished cleaning the code. I'm planning to integrate angrylion's plugin and support both hardware and software rendering. There should be as much common code as possible. The plugin already shows improvements. The crash in Glover is gone and textures in Chopper attack are fixed. The problem is that correct code can cause bugs because another part of the plugin relies on incorrect results. Some textures are buggy in this branch.
@LegendOfDragoon you said you are interested in development. I think you should know in which direction I want to move this.