purplemarshmallow / z64

Low level RDP plugin for zilmar spec N64 emulators
20 stars 8 forks source link

angrylion integration branch #2

Open purplemarshmallow opened 8 years ago

purplemarshmallow commented 8 years ago

This is currently WIP I want to upload it when I finished cleaning the code. I'm planning to integrate angrylion's plugin and support both hardware and software rendering. There should be as much common code as possible. The plugin already shows improvements. The crash in Glover is gone and textures in Chopper attack are fixed. The problem is that correct code can cause bugs because another part of the plugin relies on incorrect results. Some textures are buggy in this branch.

@LegendOfDragoon you said you are interested in development. I think you should know in which direction I want to move this.

LegendOfDragoon commented 8 years ago

Great news :smile: ! I tried looking at z64 yesterday and realized it's going to take a lot of work to fix problems though. You're definitely right that other parts of the plugin rely on incorrect results.

For example the tcdiv_persp code is wrong. Whoever wrote it, came up with an interesting algorithm, but it's too imprecise. When I tried putting a more accurate algorithm, it made things look worse. So that's why I'm hesitant to try and fix z64/z64gl. It requires too much work. For the triangle code, the scaling doesn't even make sense imo. It's not accurate enough.

But since you're interested in doing this, I am willing to help.

purplemarshmallow commented 8 years ago

Thanks any help would be great.

The reason why I'm interested is because I got very good results with little effort. Using angrylion's TMEM loading functions fixed many bugs. Vigilante8 ingame is now faster than with angrylion's plugin. I managed to see something from Superman64 a game that always completely refused to work with z64gl. SD Hiryuu runs fullspeed on my system no more slowdown.

Where is the tcdiv_persp code code in z64gl? Is it named differently? I can't find it.

If you add an accuracy improvement and it makes things look worse you have to look are there also improvements. And if there are more improvenents than regressions it should be added.

LegendOfDragoon commented 8 years ago

Sorry, I'm using the terminology from angrylion's code. https://github.com/purplemarshmallow/z64/blob/master/src/rdp-mess.cpp#L1851

On one hand, improving accuracy is bound to slow down the plugin, but there are plenty of potential optimizations to add in. I guess I can continue trying to study both z64 and angrylion's code more.

purplemarshmallow commented 8 years ago

You can't say this in general improving accuracy can also speed things up. It depends on what you are doing. Having correct TMEM emulation and correct texture loading won't slow things down for example.

LegendOfDragoon commented 8 years ago

I was talking about for this specific case. z64 is faster than angrylion's, mostly because it does not have the same precision as angrylion's. It's basically cutting a lot of corners. But yes, some accuracy improvements will not slow down emulation.

purplemarshmallow commented 8 years ago

I'm not planning to improve the z64 software plugin. I only use it as reference to see where code in z64gl comes from and how it's supposed to work in software. Then I look if it can be replaced with current code from angrylion's plugin or from MAME.

I think you misunderstood. I don't want to have common code between z64 and z64gl. I want to have common code between angrylion's plugin and z64gl. I don't think it's worth to work on the z64 software plugin. If you improve it you will end up with angrylion's plugin.

LegendOfDragoon commented 8 years ago

I brought up z64's source because isn't that what z64gl is based on? Most of what I said should still apply to z64gl. To get more accurate results, in some cases will require many more calculations to be done, instead of using quick but imprecise algorithms.

I believe the scaling code is inaccurate in z64gl.

I just felt that it's easier to track down bugs by experimenting with z64, since I do not know OpenGL too well. For example, I want to find out why shields look wrong in Super Smash Bros, and I have a better chance of figuring out the problem by comparing z64 to angrylion's.

I don't think it's worth to work on the z64 software plugin. If you improve it you will end up with angrylion's plugin.

You're right.

purplemarshmallow commented 8 years ago

To get more accurate results, in some cases will require many more calculations to be done, instead of using quick but imprecise algorithms.

I'd prefer the accurate algorithm unless it's extraordinary slow

I published my branch but it's still very WIP https://github.com/purplemarshmallow/z64/tree/angrylion-integration

F-Zero X fully working now.

LegendOfDragoon commented 8 years ago

I'd prefer the accurate algorithm unless it's extraordinary slow

Agreed.

F-Zero X fully working now.

Legit! Nice work. Can you check out Super Smash Bros? I'd really like to fix that game :smile: . I'm seeing weird changes with it in Super Smash Bros, but could be my drivers maybe. Although the original z64gl was bugged too for me, because the textures are messed up.

purplemarshmallow commented 8 years ago

It's a regression. MarkTMEMarea still needs to be corrected in my branch to work with the new code. I hope this bug will disappear once it's done 1

purplemarshmallow commented 8 years ago

But ingame it looks good for me. I hope this bug won't be too hard to fix... 1

LegendOfDragoon commented 8 years ago

In the intro, does it look like this to you? ssb intro

It is really weird how using mesa 3d fixes the shield issue. Yet it seems that z64 also has the same problem. I hope I can track it down. Here's what shields look like on my end. buggy shield

purplemarshmallow commented 8 years ago

I tested z64gl on different GPUs. On AMD and Nvidia the shield is correct. On intel HD 3000 I'm getting the same shield issue as seen in your screenshot. On intel HD 5500 it looks correct.

LegendOfDragoon commented 8 years ago

Good to know you can at least reproduce the issue on one machine. Ever since I confirmed that it happens in z64 as well, I've become extra curious about the issue. It's probably the same bug inherited from z64, but I wonder why different gpu's would make a difference in this case.

Interestingly, that tc_div function i mentioned earlier is actually slower than Angrylion's, I believe. I tried making a more optimized version of the hacky division algorithm, just out of curiosity and it was actually slower than Angrylion's more accurate algorithm. I guess using division is really slow.

purplemarshmallow commented 8 years ago

I looked into the regression in Super Smash bros. Seems not so easy to fix. All 32 bit tiles are broken. I think a good way to fix it is to find out if angrylion's plugin does some special data manipulatins with 32 bit tiles. the z64 software plugin does not.

This line seems to be wrong. Removing it makes things look a bit better but it's still broken https://github.com/purplemarshmallow/z64/blob/angrylion-integration/src/rgl_tiles.cpp#L135

LegendOfDragoon commented 8 years ago

Interestingly your angrylion integration branch seems to fix the random bugs I'd get in Last Legion UX.

purplemarshmallow commented 8 years ago

What ramdom bugs do you get in Last Legion UX?

LegendOfDragoon commented 8 years ago

This bug doesn't seem to happen with your angrylion integration branch.

last legion ux z64gl

If I had better performance with z64gl and it supported gamma, I'd totally use this plugin for Last Legion. I may try looking into these intel specific issues. I'm still puzzled by the fact that some of these intel-only bugs are also present in the z64 software plugin. I really should examine z64's code more, because fixing it on that should lead to also fixing the issue in z64gl.

purplemarshmallow commented 8 years ago

Do the texture bugs in Last Legion go away if you switch window/fullscreen? https://github.com/purplemarshmallow/z64/commit/320b50d752a06d0a0bac117594bae50f109c4c13 fixed this kind of bug for me. Or is it a different problem?

LegendOfDragoon commented 8 years ago

Do the texture bugs in Last Legion go away if you switch window/fullscreen?

Well, the color seems to have changed back to normal, but the characters were invisible after switching to fullscreen. Then switching back to window gave me a black screen (this is probably due to my Intel IGP though). Maybe I'll try testing on different hardware next week.

320b50d fixed this kind of bug for me. Or is it a different problem?

I just tried compiling with linker optimizations turned on and the bug still never happened. Which games were affected by these compiler settings?

purplemarshmallow commented 8 years ago

The problem is link time code generation and whole program optimization. If I enable them I get weird bugs in every game. Other linker optimizations can cause crashes. Not sure why.

If I enable link time code generation and whole program optimization Last Legion UX looks like this for me. Looks like in your screenshot. Disabling these options solves the problem for me. 1

purplemarshmallow commented 8 years ago

I really should examine z64's code more, because fixing it on that should lead to also fixing the issue in z64gl.

MAME's software renderer starts with the same codebase. Here's the changelog http://git.redump.net/mame/log/src/mame/video/n64.c Maybe it's worth to port that to zilmar spec. Might run faster than angrylion's plugin.

LegendOfDragoon commented 8 years ago

Great idea! i keep forgetting about MAME's code base. Analyzing old commits should really help.

LegendOfDragoon commented 8 years ago

I'm convinced that these intel-only issues are due to missing certain extensions. It has to be using a different code path than what other machines are using. Is there a good way to check which extensions I have?

Also, does your fork compile on linux? I've been trying to set up emulation on linux so that I can test z64gl with a better driver. I'm curious to see if the performance will be better too.

purplemarshmallow commented 8 years ago

I'm convinced that these intel-only issues are due to missing certain extensions. It has to be using a different code path than what other machines are using.

What are intel-only issues? I know there are texture mirroring problems in SSB and some problems with switching window/fullscreen. Is there anything else? Does Last Legion UX work now with master? You can test my build https://github.com/purplemarshmallow/z64/releases/tag/2

I can't see any GPU specific code paths in z64gl. If there are GPU specific problems it's very likely because there is a bug in this specific driver or the plugin has undefined behavior. If there is undefined behavior every driver can react differently and it can be fixed on z64gl's part. If the problem is a bug in the driver only workarounds can help.

Is there a good way to check which extensions I have?

I don't think you are missing OpenGL extensions. But there are OpenGL extension viewers you can check it. But I'm not sure which program is good.

Also, does your fork compile on linux? I've been trying to set up emulation on linux so that I can test z64gl with a better driver. I'm curious to see if the performance will be better too.

It should build on Linux and there is a Linux makefile in the repo but I never tried. Most likely the Linux makefile needs to be changed a bit. You can also try testing the fork of the Mupen64plus team.

LegendOfDragoon commented 8 years ago

What are intel-only issues? I know there are texture mirroring problems in SSB and some problems with switching window/fullscreen. Is there anything else?

Some menus in SD Hiryuu are incredibly slow, to the point where I get better performance when I use Mesa, even though Mesa is generally much slower (on Windows). I'm also thinking that some of the other performance problems I have are related. Certain people write it off as me "running a toaster", but I've done some testing. I had a friend test various plugins and I found that the gap in performance with z64gl was much bigger than the gap in performance (between my computer and his) with Rice Video D3D9. I can see why some say z64gl is fast, because when it works right, it generally is fast. Something about the way Rice coded his OpenGL and D3D plugin seems to just work properly on my computer. When I start working on a hardware rendering plugin, I will definitely examine Rice's code and try to figure out why that works so well.

Does Last Legion UX work now with master?

I'm assuming it is, because I beat the 1st three levels without seeing those weird bugs. Then when I switched back to original z64gl, I saw the bug half way through the 1st level.

I can't see any GPU specific code paths in z64gl. If there are GPU specific problems it's very likely because there is a bug in this specific driver or the plugin has undefined behavior. If there is undefined behavior every driver can react differently and it can be fixed on z64gl's part. If the problem is a bug in the driver only workarounds can help.

I'm convinced that the plugin must have undefined behavior then, because some of these issues I encountered are identical to what I saw with the z64 software plugin.

It should build on Linux and there is a Linux makefile in the repo but I never tried. Most likely the Linux makefile needs to be changed a bit. You can also try testing the fork of the Mupen64plus team.

I guess I'll try looking for the binary then. I'm pretty sure z64gl didn't come with the m64p package, because I don't seem to have it after installing m64p.

LegendOfDragoon commented 8 years ago

So I decided to see if I'm missing support for any possible functions used in z64gl, and was intrigued by the results

   GLEW Extension Info
---------------------------

GLEW version 1.9.0
Reporting capabilities of pixelformat 3
Running on a Intel(R) HD Graphics from Intel
OpenGL version 2.1.0 - Build 8.15.10.2993 is supported

GL_VERSION_3_0:                                                OK 
---------------
  glBeginConditionalRender:                                    OK
  glBeginTransformFeedback:                                    OK
  glBindFragDataLocation:                                      OK
  glClampColor:                                                OK
  glClearBufferfi:                                             OK
  glClearBufferfv:                                             OK
  glClearBufferiv:                                             OK
  glClearBufferuiv:                                            OK
  glColorMaski:                                                OK
  glDisablei:                                                  OK
  glEnablei:                                                   OK
  glEndConditionalRender:                                      OK
  glEndTransformFeedback:                                      OK
  glGetBooleani_v:                                             OK
  glGetFragDataLocation:                                       OK
  glGetStringi:                                                OK
  glGetTexParameterIiv:                                        OK
  glGetTexParameterIuiv:                                       OK
  glGetTransformFeedbackVarying:                               OK
  glGetUniformuiv:                                             OK
  glGetVertexAttribIiv:                                        OK
  glGetVertexAttribIuiv:                                       OK
  glIsEnabledi:                                                OK
  glTexParameterIiv:                                           OK
  glTexParameterIuiv:                                          OK
  glTransformFeedbackVaryings:                                 OK
  glUniform1ui:                                                OK
  glUniform1uiv:                                               OK
  glUniform2ui:                                                OK
  glUniform2uiv:                                               OK
  glUniform3ui:                                                OK
  glUniform3uiv:                                               OK
  glUniform4ui:                                                OK
  glUniform4uiv:                                               OK
  glVertexAttribI1i:                                           OK
  glVertexAttribI1iv:                                          OK
  glVertexAttribI1ui:                                          OK
  glVertexAttribI1uiv:                                         OK
  glVertexAttribI2i:                                           OK
  glVertexAttribI2iv:                                          OK
  glVertexAttribI2ui:                                          OK
  glVertexAttribI2uiv:                                         OK
  glVertexAttribI3i:                                           OK
  glVertexAttribI3iv:                                          OK
  glVertexAttribI3ui:                                          OK
  glVertexAttribI3uiv:                                         OK
  glVertexAttribI4bv:                                          OK
  glVertexAttribI4i:                                           OK
  glVertexAttribI4iv:                                          OK
  glVertexAttribI4sv:                                          OK
  glVertexAttribI4ubv:                                         OK
  glVertexAttribI4ui:                                          OK
  glVertexAttribI4uiv:                                         OK
  glVertexAttribI4usv:                                         OK
  glVertexAttribIPointer:                                      OK

GL_VERSION_3_1:                                                OK 
---------------
  glDrawArraysInstanced:                                       OK
  glDrawElementsInstanced:                                     OK
  glPrimitiveRestartIndex:                                     OK
  glTexBuffer:                                                 OK

GL_VERSION_3_2:                                                MISSING 
---------------
  glFramebufferTexture:                                        OK
  glGetBufferParameteri64v:                                    MISSING
  glGetInteger64i_v:                                           MISSING

GL_VERSION_3_3:                                                OK 
---------------
  glVertexAttribDivisor:                                       OK

I wonder why it says OpenGL version 2.1.0. Now I'm even more confused.. I really need to start testing on linux. Hopefully I can figure out how to fix some of these issues I'm having, without having to use Mesa.

cxd4 commented 8 years ago

I wonder why it says OpenGL version 2.1.0.

Because that's your version. You're looking at an extensions test.

So I decided to see if I'm missing support for any possible functions used in z64gl,

First almost every one of the commands in your paste are never even used by z64gl.

Second every GL command ever used by z64gl that isn't internal to Windows native software driver is already queried on RomOpen by my new extension-loading code I replaced GLEW with. You should check the console window for output to see the number of used functions that failed to load...if it was greater than 0 you'd probably just have a guaranteed crash anyway.

LegendOfDragoon commented 8 years ago

You should check the console window for output to see the number of used functions that failed to load...if it was greater than 0 you'd probably just have a guaranteed crash anyway.

I see. I checked and it's 0. I guess I'll have to find out some way to debug these weird issues.

LegendOfDragoon commented 8 years ago

Well, I profiled SD Hiryuu's 1st menu and it's saying that a good portion of the time is spent in glBegin in RglRenderChunks. Is that normal?

cxd4 commented 8 years ago

Not really, but upon this subject I'll make a note-to-self that glBegin() is deprecated and should be replaced with vertex arrays--preferably server-side vertex arrays, but client-side will still work in OpenGL 1.1 if the installable client driver accelerator on top of Microsoft's opengl32.dll is disabled for debugging.

purplemarshmallow commented 8 years ago

I think to run z64gl you don't even need OpenGL 2.0. The conf file says it works on Geforce 5900. That's over 10 years old and these PC can't run LLE at fullspeed anyway. Would be better to use more modern OpenGL if possible

LegendOfDragoon commented 8 years ago

Not really, but upon this subject I'll make a note-to-self that glBegin() is deprecated and should be replaced with vertex arrays

I see. I'll definitely have to look into this, if I want to get z64gl running full speed on more games.

Would be better to use more modern OpenGL if possible

Yes, even glN64 uses glDrawArrays. Interestingly, Rice's OGL code uses glBegin, yet it's fast in many games. I'll have to examine Rice's code more.

cxd4 commented 8 years ago

I wouldn't really call glDrawArrays() "modern" OpenGL though.

It's been part of the specs since GL 1.1 from 1997, pretty old. The reason it hasn't been deprecated is that it's a finalized command to process the queued vertex arrays--which are still deprecated if they are done client-side.

@purplemarshmallow I can bump the calls up to modern OpenGL, but in doing so I will prefer to have a compatibility fallback to people either with pre-2.0 video cards or running z64gl on Windows in GDI software mode for debugging video driver issues and glitches. (The core Microsoft version of OpenGL is 1.1, so if you do things like enable "Use 256 colors" in exe compatibility settings, that's what you'll get, and z64gl cannot work unless it is designed to fall back to GL 1.1 functions. (Granted this driver is painfully slow, and video card driver debugging and nostalgia aside there is little reason why someone would be content to test it.))

Interestingly, Rice's OGL code uses glBegin, yet it's fast in many games. I'll have to examine Rice's code more.

glBegin() is not really slow, in spite of being deprecated. In fact, for small command lists it is possible that it's even faster than some vertex arrays. Just depends how you do your GL.

LegendOfDragoon commented 8 years ago

glBegin() is not really slow, in spite of being deprecated. In fact, for small command lists it is possible that it's even faster than some vertex arrays.

Interesting. Good to know.

I can bump the calls up to modern OpenGL

I'm curious to see how much this change would impact performance :smile: .

cxd4 commented 8 years ago

glBegin() can be as fast--faster even--than vertex arrays on some implementations/video card drivers etc. if the vertex array is very small--like a single (x, y, z) for drawing a single point/pixel and that's it.

Client-side vertex arrays use up CPU time arranging pointers to system memory in the form of C arrays, which need to be streamed to video memory every time glDrawArrays() is called.

Server-side vertex arrays use up the CPU time by uploading C system memory only once to video memory for long-term storage, having that done before glDrawArrays() is ever called. This is still overkill for a "vertex array" which really is only a handful or less of vertices.

Immediate-mode rendering, using the fixed-function pipeline glBegin() and glEnd(), caches nothing at all and in the long term--but not always--is slower. It's deprecated mostly out of philosophy.

purplemarshmallow commented 8 years ago

I can bump the calls up to modern OpenGL

Would be very nice to get rid of deprecated OpenGL. I have AMD Nvidia and intel GPUs to test would be interesting to see if it affects performance.

I will prefer to have a compatibility fallback to people either with pre-2.0 video cards or running z64gl on Windows in GDI software mode for debugging video driver issues and glitches.

I see no need to support anything before OpenGL 2.0. PCs with these GPUs can't run it at fullspeed anyway and I don't think it can ever work in GDI software mode because z64gl uses shaders

ghost commented 8 years ago

So that's why I'm hesitant to try and fix z64/z64gl. It requires too much work. For the triangle code, the scaling doesn't even make sense imo. It's not accurate enough.

Would be better to rewrite, while using some of z64's concepts, such as chunk rendering. Replacing all edgewalker rendering, combiners,and the texture/tile/framebuffer cache would be nice and help with memory usage, especially if you use a hashtable library such as uthash. No doubt you could also do packing of tiles into one massive texture too.

LegendOfDragoon commented 8 years ago

Would be better to rewrite, while using some of z64's concepts, such as chunk rendering.

If i were to work on a video plugin, it will definitely be from scratch. However, before I am able to do that, I will have to experiment with different plugins. I'm very interested in figuring out why Rice's Video plugin is so fast. I also need to figure out a good way to profile API code, so that I can learn how to optimize it.

I actually like z64gl's framebuffer code. I get better performance in Mario Kart with z64gl than Glide64 while using fb emulation for the monitor. I'll have to see how he did that.

ghost commented 8 years ago

Didnt't framebuffer notification was used for that?

LegendOfDragoon commented 8 years ago

Didnt't framebuffer notification was used for that?

Yes. The difference between z64gl and Glide64 is that it uses glReadPixels to copy the video memory over to RAM. That's what makes it faster than Glide64's method.