wave-harmonic / crest

A class-leading water system implemented in Unity
MIT License
3.48k stars 480 forks source link

Optimizations #79

Closed gamemachine closed 5 years ago

gamemachine commented 6 years ago

So I'm tackling performance and while the optimization readme has some good stuff, what I'm finding is a lot of little things that are adding up.

I'm going to advocate for a performance by default approach. Having to modify the code to get production performance should not be the default. Ease of use things like the ability to tune everything at runtime should be secondary. Real games have to take that approach, there is no other choice (other then having both).

The thing is the little stuff adds up. I'm closing in on 2ms of main thread cpu time gone from over a dozen little things. It's all little stuff but in the aggregate it adds up to something substantial.

The challenge is that right now Crest is unusable for many games that would use this type of water. Games simply don't have 8ms of budget for water, which unless you disable things to the point of the water not even being interesting anymore, is where it's at. That's half of your main thread budget for your entire game. Realistically it should be around 2ms, I think that's entirely doable but only if little stuff matters.

There is a lot I like about Crest, and I knew performance was an issue going in. And that most of the performance issues looked to have fairly straight forward fixes. My fear is that the lack of attention to performance will lead to it not becoming straight forward as more and more features get piled on.

dizzy2003 commented 6 years ago

Are you profiling in editor or stand alone, are you using unity profiler or toggling features and observing the resultant change?

Also I'm a bit confused because you say its taking nearly 2ms then you say it shouldnt take 8 ms then you say it should take more like 2ms.

On Wed, Sep 26, 2018 at 9:09 AM, Chris Ochs notifications@github.com wrote:

So I'm tackling performance and while the optimization readme has some good stuff, what I'm finding is a lot of little things that are adding up.

I'm going to advocate for a performance by default approach. Having to modify the code to get production performance should not be the default. Ease of use things like the ability to tune everything at runtime should be secondary. Real games have to take that approach, there is no other choice (other then having both).

The thing is the little stuff adds up. I'm closing in on 2ms of main thread cpu time gone from over a dozen little things. It's all little stuff but in the aggregate it adds up to something substantial.

The challenge is that right now Crest is unusable for many games that would use this type of water. Games simply don't have 8ms of budget for water, which unless you disable things to the point of the water not even being interesting anymore, is where it's at. That's half of your main thread budget for your entire game. Realistically it should be around 2ms, I think that's entirely doable but only if little stuff matters.

There is a lot I like about Crest, and I knew performance was an issue going in. And that most of the performance issues looked to have fairly straight forward fixes. My fear is that the lack of attention to performance will lead to it not becoming straight forward as more and more features get piled on.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/huwb/crest-oceanrender/issues/79, or mute the thread https://github.com/notifications/unsubscribe-auth/AOU9GFazfoey35h5ljLh_BVf17rZW5c3ks5uereugaJpZM4W5oOH .

huwb commented 6 years ago

Yes, please post data and methodology. I'm a games developer myself who typically ends up working on perf when finaling projects and I understand the domain.

Performance is a key concern for me and I've built things to be efficient where possible, but I have limited resources and limited access to hardware so I'll need to rely on you guys to help ID perf problems. That starts with data and we can discuss from there. That's the limit of service that is currently available in crest I'm afraid.

Most of crest features are on toggles as dizzy notes so there's a bunch of scope for experimentation - as long as methodology is good. The frame rate should be unlocked in standalone builds in the crest unity project so I typically run standalone release builds on my laptop with fraps to get a high-level max(CPU time, GPU time) number, as a starting point.

Oh and specs - dxdiag for max points or CPU and GPU details would work, so that we know the context of the numbers.

gamemachine commented 6 years ago

I mean't I've shaved off nearly 2ms of cpu time. My machine isn't super new gtx 660 graphics card, core I7 3.4 ghz 16gb ram.

All of what I'm seeing is via profiling, editor and builds depending on what I'm checking.

What I'm mostly talking about is all of the low hanging fruit. Like wave spectrums updating every frame, property blocks being updated every frame, the gpu readback I filed a ticket on, the bounds updates, caching transforms where appropriate. It's all adding up.

I'll put together some pull requests when I get a chance.

moosichu commented 6 years ago

"I'm going to advocate for a performance by default approach. Having to modify the code to get production performance should not be the default. Ease of use things like the ability to tune everything at runtime should be secondary."

I would personally disagree with that as a blanket statement, as whilst optimal performance is important for shipping games - the ease of use for changing things is much more important earlier on in a project. Furthermore, there is a limit to how much Crest can be optimised, as we don't know the context within which it will be used or which features certain developer may want. Although I'm sure there are many performance optimisations that can probably be made that are 'win-win' - most optimisations make a tradeoff, and the tradeoffs which are worth making are incredibly project specific..

Saying that, I'm looking forward to seeing what your pull requests contain because it would be good to see them and the data that you have collected. Sorry if I have misunderstood what you meant.

gamemachine commented 6 years ago

Ease of use is nice, but when the performant option is obvious and simple and you don't take it, that's where I part ways. The specific things I mentioned were all simple to fix but add up.

Right now Crest clocks in at roughly double the cpu usage of the next most expensive water that runs on Unity. And you actually do have some idea of what types of games use water like this, it's actually fairly niche and they tend to be fairly complex and generally multiplayer. Many will be outdoor settings using terrain. Add up everything you would expect to see in that environment. Lighting, post processing, some character controllers, vegetation, atmospheric scattering. That's just the basics. Throw in Crest and you will be right at 60fps most likely.

huwb commented 6 years ago

Something is very wrong if you are seeing 8ms of CPU time attributable to Crest. We don't need to debate whether 8ms is ok, as its clearly not. Please give details and data and I'm sure we can quickly get to the bottom of the issue.

Also please provide any significant changes to the default setup you might have made. Chiefly the LOD count, base vert density, and which simulations you have turned on.

huwb commented 6 years ago

On my dell xps laptop i ran a standalone build at 320x200 resolution and got 360fps - so 2.78ms per frame - which will include all of the unity stuff as well as the ocean. This is without doing any optimisations. It was the main scene from the examples, which includes the island and bits, has some postproc enabled including SMAA, and then all the misc unity overhead.

Obviously its not clear from such a dumb test whether it is CPU or GPU bound, but its definitely not 8ms CPU for crest-only on my laptop.

gamemachine commented 6 years ago

Non dev build of the crest demo with the debug ui disabled shows right at 5.5 - 5.9 ms per frame. That's with the optimizations which are right at 1.8ms or so. Linear/deferred setup, highest quality.

I'm running quad core i5-3570 3.4ghz cpu. Profiling dev build doesn't show cpu waiting on gpu.

I can only assume it's hardware differences. But the hardware I'm working on is actually a significant chunk of players I'll be targeting. I really can't use latest hardware as a base.

huwb commented 6 years ago

If I was to guess, your test was probably GPU bound - maybe it's not reporting the wait for random reasons. But it's not clear yet from what you've provided. Could you follow my methodology of running at low res to see what frame time you get?

It could also be that the async readback is failing, for even more random reasons.

Could you grab a profile so we can see the breakdown? I don't remember the details but I think there's a way to have the game dump it to log. Or the in-editor profiler can connect to standalone development builds, getting an export of that breakdown would give a picture of the perf we should be able to work from.

It occurs to me I should document this for future use, I'll do this after.

huwb commented 6 years ago

Oh theres a "readback coll data" option on the GUI, might be worth toggling that off while running to check the shape readback is not blocking. The async stuff is a new API and might not work on some GPUs.

huwb commented 6 years ago

My CPU is i7-7700, 2.8ghz. I know it's a bunch newer, but it would be a shock to me if a desktop CPU was 3x slower than a laptop CPU clocked 20% slower. But I've seen stranger things and there's little use in making assumptions.

gamemachine commented 6 years ago

I'll try it at low settings see what that does.

If it's gpu bound it's got to be more bandwidth bound then work bound I would assume. Being that I render far more stuff in game then Crest does by a hefty margin, it's just most of it is instanced indirect or being pulled from compute shaders. Crest is the only thing sending large amounts of data every frame.

gamemachine commented 6 years ago

One side note, for some reason builds of the crest demo, and also another scene I have that's just the water I've been using for testing, the water shows up all light grey can't even make out the waves really. In my game scene it's fine so it must be some settings but I didn't see anything obvious.

huwb commented 6 years ago

Grey ocean - sounds concerning. I've not heard this from elsewhere and just did a local build here and it seems fine. If you are still getting this could you create a new issue for it, and include the log from the player (gets saved somewhere near the build, dont quite remember where), and a screenshot, and any other relevant info such as what scenes you're running. It's important its in a new issue so that we don't end up with stuff tangled together, and also so that its discoverable in the future in case anyone else gets something similar.

It doesnt send large amounts of data, but it does transfer back the shape textures, which for the example content is six 256x256 textures. This felt reasonable but I don't have a great deal of experience with this kind of stuff and this might be costly on older cards. Theres a toggle for reading back the shape texture on the debug gui to test exactly this. Oh I already suggested this above. If you want to move this forward please go through the above again and do some of the suggested tests and provide the requested profile breakdown. Otherwise I'll close this as this is a sink for my time, energy & focus.

huwb commented 5 years ago

Hey,

After writing the above i started to wonder if the bandwidth is a significant overhead, especially on older hardware (its a not-insignificant amount of data).

I recently did a big rewrite of how displacement textures are copied from the GPU to the CPU. One benefit of this is one can now limit how many displacement textures are copied - see the above commit. It's not as trivial to configure the system as I would have hoped, but its a start. By default it won't copy some of the very detailed textures. This will save on both the bandwidth and the CPU time to copy the resulting data which you've mentioned in the past.

The other major "win win" CPU perf fix that is on my radar is the fact that the different LodData generally require cameras in order to render the data. As a result the CPU Render cost is higher than it needs to be, and some of this could go away if I switch everything over to using command buffers. Such a move will be required anyway when going to the scriptable render pipelines I believe, so its something I hope to begin chipping away at in the short to mid term future, but my time is limited at the moment so it might be a while before that stuff starts filtering through.