Closed woistjadefox closed 3 years ago
Thanks for the detailed numbers.
Could you please provide some more details about the ocean setup. The following would be ideal:
These numbers are much larger than i would expect. It would be interesting to know how it performs on Release (on ps4 only would be fine). We take Development numbers with a big grain of salt, but even with the salt this is very high cost.
Just to check - it's not spewing errors to the log is it? That would increase the cost a bunch.
Could you try disabling Evaluate Spectrum At Runtime on the ShapeGerstnerBatched component? We evaluate the spectrum each frame by default in case it changes, and this work is nontrivial (especially if there is more than one gerstner script).
A good next step would be to add more profiling tags to pin down where the cost is coming from. Adding them to each type of LodData/feature could be a good step. Having said that, OceanChunkRenderer.OnWillRenderObject()
is fairly self-contained, and really does not do very much, so its slightly disturbing to see such a high cost attributed to it.
Could you please provide some more details about the ocean setup.
The ocean setup is the default setup from Crest's main scene (also the default ocean material which comes with the URP / HDRP package from the asset store).
These numbers are much larger than i would expect. It would be interesting to know how it performs on Release (on ps4 only would be fine). We take Development numbers with a big grain of salt, but even with the salt this is very high cost.
I built a PS4 package of the main scene and added Graphy (fps counter etc.) in order to be able to compare performance, since release mode only allows non-dev build packages where I can't use the profiler.
The results are similar to dev builds: Main scene (Unity 2019.4.7f1 / URP 7.4.3 / Crest 4.5) Dev mode: 68fps / Release mode: 72fps
Main scene (Unity 2019.4.7f1 / HDRP 7.3.1 / Crest 4.5) Dev mode: 38fps / Release mode: 41fps
Just to check - it's not spewing errors to the log is it? That would increase the cost a bunch.
There are no errors to the logs.
Could you try disabling Evaluate Spectrum At Runtime on the ShapeGerstnerBatched component? We evaluate the spectrum each frame by default in case it changes, and this work is nontrivial (especially if there is more than one gerstner script).
Disabled the option on the 1 existing ShapeGerstnerBatched script in the main scene, but it didn't impact the performance.
A good next step would be to add more profiling tags to pin down where the cost is coming from. Adding them to each type of LodData/feature could be a good step. Having said that, OceanChunkRenderer.OnWillRenderObject() is fairly self-contained, and really does not do very much, so its slightly disturbing to see such a high cost attributed to it.
Added a Profiler.BeginSample before every lod update called by OceanRenderer.LateUpdateLods: Not sure if this helps a lot.
Let me know if there is anything else!
I just deep profiled the main scene on the ps4 and maybe we are getting some more clues about the work load:
I uploaded the profiler data here if you want to have a look on your self: link
Thanks for the above! That helps a bunch.
Regarding dev vs release - the build may be GPU-bound, meaning that the thing that determines the framerate is the rendering, not the CPU update. That would mask/hide any change in CPU performance. I think we should probably add profile markers which hopefully work in release (if we find they dont work in release, we could roll it ourselves i guess - add our own timer and log the average cost every 5s or so).
The two things you highlighted in the deep profile images seem like good targets for optimisation. Fascinating that BindData()
is so prominent. We can likely simplifiy this - it does a bunch of repeated work (it sets vector and texture parameters on the shader, and ends up reassigning the same things over and over again). We could break it up and make specialised functions for setting each thing, which should make the code simpler as well.
The PropertyWrapper* classes may benefit from being forced inline as well.
Thanks again, i'll follow up with news.
So this exposed that theres a bunch of work that does not need to be done in OceanChunkRenderer, and we could remove it completely.
PR to remove it just merged in: https://github.com/crest-ocean/crest/pull/639
Would it be possible to get a test from the latest on the master branch here? Development build is fine. Thanks!
So this exposed that theres a bunch of work that does not need to be done in OceanChunkRenderer, and we could remove it completely.
PR to remove it just merged in: #639
Would it be possible to get a test from the latest on the master branch here? Development build is fine. Thanks!
Sorry was on holidays until yesterday. Will try to get test results as soon as possible!
Alright, I did tests with the built-in version from the master branch (commit 914f151) and one from the most recent commit (aa210a4) on all the consoles:
The changes made to the OceanChunkRenderer are almost not recognizable. But I just figured out that the builtin version of Crest is running way faster than the URP version in general. Not sure if this issue might be more related to the scriptable render pipeline (SRP) on consoles?
Amazing - from ~4ms down to 0.25ms? Or am i misunderstanding? (Edit: oh you're saying the overall picture looks very different?)
Btw this change is built-in only for now, but i think you said you tested built-in?
We construct the command buffer on the fly in oceanrenderer.lateupdate. I'm fairly sure that large portions of the command buffer does not need to be rebuilt each frame. That would be the next CPU optimisation target I reckon.
Amazing - from ~4ms down to 0.25ms? Or am i misunderstanding? (Edit: oh you're saying the overall picture looks very different?)
Btw this change is built-in only for now, but i think you said you tested built-in?
I could only test Crest built-in on the consoles, since I guess these changes are not yet present in the URP / HDRP version. I did a test before you merged the changes and one afterwards. Unfortunately the difference is only 0.03ms but as you mentioned, the overall picture looks very different compared to the URP & HDRP Crest version.
For example, the PS4 built-in Crest version takes 2.12ms to render the functions I measured in the main scene. The URP & HDRP version are both above 9ms. I wonder why there is such a huge difference? But also if I compare built-in and URP/HDRP Crest on the XB1 or Switch, built-in seems to be quite faster. (check the results from the initial post)
Is one of those commits from before the optimisation? Apologies i'm probably being slow but im quite confused
EDIT: Right i've seen your comment that this is the before and after. That's quite hard to believe. But yes you're right its BIRP only and it would be fascinating if there is a massive performance differential between BIRP and SRPs. These optimisations will go live in 4.6.
EDIT: Right i've seen your comment that this is the before and after. That's quite hard to believe. But yes you're right its BIRP only and it would be fascinating if there is a massive performance differential between BIRP and SRPs. These optimisations will go live in 4.6.
I will definitely test it as soon as 4.6 is out for the two SRPs and post here the results of the main scene.
@woistjadefox @OkoGoran 4.6 is out with these improvements for URP. When you have time, would love to hear if this solves performance issues.
@daleeidd argh, sorry just figured out that I answered here with two different accounts :)
Tried to make a build but I get these old errors: editor_error_log.txt
I think this happend in a version before already. Should I just switch all half3 and half2 to float3 / float2 in the underwater shader for testing reasons?
I think this happend in a version before already. Should I just switch all half3 and half2 to float3 / float2 in the underwater shader for testing reasons?
Yes, please do. Apologies for that. We should be more careful with those mismatching types. Are they the only ones?
Sorry, I still couldn't find time to do the benchmarks. Hopefully next week!
As linked above, we made a new gerstner wave system which saves a bunch of time.
With this, both of the paths you highlighted in your deep profile screenshots have either been removed completely, or can be avoided by running the new gerstner stuff.
I'll leave this open in case you are able to retest in the short term, if not then I think we'll close until more data comes in.
Hi there! Sorry it took me so long to answer to this thread / issue. Meanwhile I can tell, that I forgot to enable a certain option in the specific player settings for the PS4. Due to NDA I'm not allowed to talk about the concrete option, but it has something to do with il2cpp optimizations if somebody comes across the same issue.
I did tests with URP & HDRP 4.8 on the PS4 and it runs fine and fast. I will therefore close this tread.
Ha that explains those numbers! Well it led to some great optimisations so was a productive issue in any case :)
If you happen to have numbers lying around from latest tests, which you can share, feel free to. But not required.
@huwb here are numbers of a dev build on the ps4 base HDRP crest version:
Thanks! Still on the high side, hopefully it's less in release.
Are you using ShapeGerstner instead of ShapeGerstnerBatched? Definitely worth testing the former, it should be a drop in replacement and much lighter on CPU.
It's the main scene from the example folder. As I reckon it is using ShapeGerstnerBatched.
Describe the bug Dear Crest community, I've ran a few performance tests today (main scene) on the major consoles and found out that Crest performs quite bad on the PS4. Even worse than on the Switch. Tested a few different versions, URP and HDRP and the verdict stays the same (check the screenshot). I remember having issues to implement async gpu readback on PS4 in our last game, it was super slow and I had to split my requests over frames into multiple readbacks/requests. This happened only on the PS4 and according to the internal PS4 forum, it has something to do with the memory access management of the device. Unfortunately I'm not allowed to give here more insights due to NDA. I wonder if something similar could be the case here and if there is any way to get a better performance on PS4.
Screenshots / video
Additional notes You can get all the profiler logs here: crest-profiler-data.zip