wave-harmonic / crest

A class-leading water system implemented in Unity
MIT License
3.49k stars 478 forks source link

Getting the height of water at X,Z without a massive fps drop. #127

Closed ImtehQ closed 5 years ago

ImtehQ commented 5 years ago

I got a script running to get the height of the water like the little boats do in one of the demos, however once you go over lets say 10 calls a frame (that's 1 points) it drops the fps a lot because it takes forever to get the data back from the ocean.

Here is the code im using for it: ` position = new Vector3(x, 0, z);

        undispPos = Vector3.zero;
        if (!colProvider.ComputeUndisplacedPosition(ref position, ref undispPos)) return -999;
        if (!colProvider.SampleDisplacement(ref undispPos, ref _displacementToBoat)) return -999;
        if (!_displacementToBoatInitd)
        {
            _displacementToBoatLastFrame = _displacementToBoat;
            _displacementToBoatInitd = true;
        }

        var dispPos = undispPos + _displacementToBoat;
        return dispPos.y;`

Image of debugger

Is there a better way of doing this?

holdingjason commented 5 years ago

This might be related to a post I just made over here. I am noticing an issue with GC for these calls which I think is causing some (most/all/not sure) of the issues. Can you open up the hierarchy on those and see where the biggest chunk of time is being spent. You can see on mine that it was GC alloc.

https://github.com/huwb/crest-oceanrender/pull/117#issuecomment-440372739

ImtehQ commented 5 years ago

There no GC Alloc to speak of with 1 point, however there are tons of calculations being made.

screenshot

ImtehQ commented 5 years ago

I looked into the code, searching some a way to get height by area and there is code there but it just calls the old ones, so does this mean its just indexing the entire ocean with each getwaterHeight ?

I need to do about 100 calls a frame by increasing the timestamp it no longer calling 10 times on each point, so that reduces the fps drop a bit, but still need to call it 100 times a frame, as of right now it not even possible to do 10 without the fps dropping down to 35 👎

holdingjason commented 5 years ago

Ok that is weird. I am doing a bunch of these calls as well inside of a floater script via fixedupdate and not seeing a big hit.

here is a shot of my frame and shows 24 hits to fixed which is doing the following code. I know I have about 20 of these being triggered right now. So not sure why you are seeing such a wallop.

image

private void FixedUpdate() { if (_active) { if (GPUReadbackDisps.Instance) { GPUReadbackDisps.Instance.ProcessRequests(); }

        if (UseNonPhysics)
        {
            UseNonPhysicsToFloat();
        }
        else
        {
            UsePhysicsToFloat();
        }
    }
}
void UsePhysicsToFloat()
{
    if (_rigidBody != null)
    {
        //float volume = _rigidBody.mass / Density;
        float archimedesForceMagnitude = WATER_DENSITY * Mathf.Abs(Physics.gravity.y);

        _avgDisplacement = Vector3.zero;

        for (int i = 0; i < ForcePoints.Length; i++)
        {
            FloaterForcePointsV3 point = ForcePoints[i];

            Vector3 transformedPoint = transform.TransformPoint(point.OffSetPosition + new Vector3(0, CenterOfMass.y, 0));

            Vector3 undispPos;
            if (!_colProvider.ComputeUndisplacedPosition(ref transformedPoint, out undispPos, MinSpatialLength))
            {
                // If we couldn't get wave shape, assume flat water at sea level
                undispPos = transformedPoint;
                undispPos.y = OceanRenderer.Instance.SeaLevel;

                Debug.Log("Not Found");
            }

            var waterSurfaceVel = Vector3.zero;

            bool dispValid, velValid;
            _colProvider.SampleDisplacementVel(ref undispPos, out point.Displaced, out dispValid, out waterSurfaceVel, out velValid, MinSpatialLength);

            var dispPos = undispPos + point.Displaced;

            var velocityRelativeToWater = _rigidBody.velocity - waterSurfaceVel;

            float height;

            _colProvider.SampleHeight(ref transformedPoint, out height, MinSpatialLength);

            float distance = dispPos.y - transformedPoint.y;

            Vector3 normal;

            if (!_colProvider.SampleNormal(ref undispPos, out normal, MinSpatialLength))
            {
                normal = Vector3.up;
            }

            var result = _colProvider.CheckAvailability(ref transformedPoint, MinSpatialLength);

            if (result != AvailabilityResult.DataAvailable)
            {
                Debug.LogWarning("Validation failed: " + result.ToString() + ". See comments on the AvailabilityResult enum.", this);
            }

            //distance = height - transformedPoint.y;

            if (distance > 0)
            {
                if (result == AvailabilityResult.DataAvailable || result == AvailabilityResult.NoDataAtThisPosition)
                {
                    _rigidBody.AddForceAtPosition(archimedesForceMagnitude * distance * ( Vector3.up + normal) * point.Factor * ForceMultiplier, transformedPoint);
                    //_rigidBody.AddForceAtPosition(archimedesForceMagnitude * distance * (Vector3.up) * point.Factor * ForceMultiplier, transformedPoint);
                }
            }

            _rigidBody.AddForceAtPosition(Physics.gravity * GravityMultiplier, transformedPoint, ForceMode.Acceleration);

            _avgDisplacement += point.Displaced;
        }

        _avgDisplacement = _avgDisplacement / ForcePoints.Length;      
    }
}
ImtehQ commented 5 years ago

if (GPUReadbackDisps.Instance) { GPUReadbackDisps.Instance.ProcessRequests(); }

what is this, if (GPUReadbackDisps.Instance) { GPUReadbackDisps.Instance.ProcessRequests(); }

??

holdingjason commented 5 years ago

You can see it in boatalignnormal script in boats example. Here is the comment.

// Trigger processing of displacement textures that have come back this frame. This will be processed // anyway in Update(), but FixedUpdate() is earlier so make sure it's up to date now.

holdingjason commented 5 years ago

Would not think it would have much to do with the time issue your seeing.

holdingjason commented 5 years ago

Ok and I can dump all the rest of that stuff and just go with a check to sampleheight my times drop below 1ms for 24 checks per frame. Not sure if perhaps caching the colProvider is an issue on your side but something to try if your not but doubt it.

void FixedUpdate() { if (GPUReadbackDisps.Instance) { GPUReadbackDisps.Instance.ProcessRequests(); } _colProvider.SampleHeight(ref transformedPoint, out height, MinSpatialLength); }

holdingjason commented 5 years ago

Here is a shot with 16 bouys each with three checks on them at each corner. Profile shows 54 checks per frame at just under 2ms. I would like that to be cheaper and probably can using some math optimization tricks.

image

ImtehQ commented 5 years ago

bit odd, as _colProvider.SampleHeight(ref transformedPoint, out height, MinSpatialLength); dont work correctly, first of heigh must be a ref, not a out, and no 3th parameter.

But SampleHeight seems to be a bit faster so ill be using that, thanks

huwb commented 5 years ago

Thanks for jumping in with replies/suggestions.

The call to GPUReadbackDisps.Instance.ProcessRequests() checks for any new arriving data coming back from the GPU. If there is newly arrived data then it takes a copy of it. It should only do this once per frame. If you dont call this in FixedUpdate, it will anyway get called later in Update(), but this means you wont be getting latest data in FixedUpdate, and i believe this means an extra frame of latency. I recommend calling it.

So, why is Crest doing all that computation you showed in the capture above?

Because the ocean includes arbitrary horizontal displacements, when you want a height it needs to find the point that is displaced to give the height. I talk about this more here: http://www.huwbowles.com/fpi-gdc-2016/

As noted in that presentation, it finds this point by doing an iterative search to find that source point. While doing this search it samples displacement at a few locations.

For collision readback from the GPU, whenever you sample displacement it looks for a LOD that encapsulates the sample position - thats the various calls to Rect functions.

This all ends up being relatively expensive and its not great. There is a cache which should be on by default (unless you have provided an asset of type SimSettingsAnimatedWaves that disables it), but the cache will only help if you sample the same(ish) locations multiple times.

If you are taking many samples that are nearby (i.e. a bunch of samples under a boat), there is a way to do better - find the position that displaces to the center of your boat using ComputeUndisplacedPosition , warm up the LOD selection by calling PrewarmForSamplingArea for the area under the boat, and then call SampleDisplacementInArea for all your points. This should jump over most of the craziness described above.

If you're sampling a bunch of separate locations, the best suggestion i can think of is trying to do the queries in parallel using the job system etc like they do in https://forum.unity.com/threads/lwrp-boat-racing-demo-from-gdc.527104/ . That could be something for crest one day, if/when i have time..

huwb commented 5 years ago

Oh the SimSettingsAnimatedWaves also allows the collision source to be changed - you can make it compute the Gerstner waves directly on the CPU instead of using the data read back from the GPU. I believe this would be slower but i havent tried it in a while, so it might be worth a shot.

holdingjason commented 5 years ago

My issue is that it's weird because I am not seeing the same issues with speed at least I don't think I am based in my last post. What version are you using I wonder? I am on the latest main

ImtehQ commented 5 years ago

im on version 4.0.0

holdingjason commented 5 years ago

Just read your previous post on the interface for sample height. That does not seem correct I think that was an old way of doing this so maybe something got messed up on your install. @huwb does that sound correct? Maybe try deleting both directories in your project and updating again from scratch. Sample height should work the way I posted it I think it used to be a ref but now out with the third param

huwb commented 5 years ago

I'm not sure, I reread the thread but it's not really clear to me. Perhaps we could get an exec summary:

How many height queries How much it is costing (in ms). Profiler output showing the problem would be ideal Hardware being used How many lods Are you sampling lots of different locations or all near one location

ImtehQ commented 5 years ago

I'm not sure, I reread the thread but it's not really clear to me. Perhaps we could get an exec summary:

How many height queries How much it is costing (in ms). Profiler output showing the problem would be ideal Hardware being used How many lods Are you sampling lots of different locations or all near one location

Well it dependence a bit, but right now im doing 320 calls, with a Time ms: 56.30m and self ms: 0.06ms. But no GC Alloc.

im using a i7 3770 @ 3.9 Ghz, 24Gb of ram, and a gtx 1080Ti Im using the default boat scene included in the asset.

Also tested it both on the stable 4.0.0 version and latest version, both have issues getting the water level.

The thing is that i for example need to get the water level of many close by points, for example getting the water level of every triangle of a sphere mesh. In that example it will be about 740 calls.

huwb commented 5 years ago

I'm looking at making a more efficient path for queries in the branch refactor/sampling-data , see commit 38e9779bb4bac13dda02b86f14844bafda2c339c

I create a SamplingData at the beginning of the update for both BoatAlignNormal and BoatProbes and this generates the necessary data to quickly(ish) sample the collision.

On my laptop in editor (so not optimized) i opened boat.unity and created 10 additional duplicates of BoatProbes. Each boat does 12 probes, so in total it is 11 boats X 12 probes == 132 queries, and the time in the profile window shows 1.4ms for BoatProbes.FixedUpdate . (my laptop is a Dell XPS15 from last year)

ImtehQ commented 5 years ago

I'm looking at making a more efficient path for queries in the branch refactor/sampling-data , see commit 38e9779

I create a SamplingData at the beginning of the update for both BoatAlignNormal and BoatProbes and this generates the necessary data to quickly(ish) sample the collision.

On my laptop in editor (so not optimized) i opened boat.unity and created 10 additional duplicates of BoatProbes. Each boat does 12 probes, so in total it is 11 boats X 12 probes == 132 queries, and the time in the profile window shows 1.4ms for BoatProbes.FixedUpdate . (my laptop is a Dell XPS15 from last year)

So I got the last version and copied the code from the BoatProbes, all settings are the same except in my case im sending a X & Z value, then create a compare point, because my loop to go trough all the points is located elsewhere, but the idea is still the same.

` public float GetWaterLevel(float x, float z) {

if Crest

        if (GPUReadbackDisps.Instance)
        {
            GPUReadbackDisps.Instance.ProcessRequests();
        }

        collProvider = OceanRenderer.Instance.CollisionProvider;
        thisRect = new Rect(transform.position.x - 10f, transform.position.z - 10f, 20f, 20f);
        if (collProvider.GetSamplingData(ref thisRect, 12, _samplingData))
        {
            position = new Vector3(x, 0, z);

            if (!collProvider.ComputeUndisplacedPosition(ref position, _samplingData, out undispPos))
            {
                // If we couldn't get wave shape, assume flat water at sea level
                undispPos = position;
                undispPos.y = OceanRenderer.Instance.SeaLevel;
            }

            collProvider.SampleDisplacement(ref undispPos, _samplingData, out displaced);

            dispPos = undispPos + displaced;
            return (dispPos.y - position.y);
        }

        collProvider.ReturnSamplingData(_samplingData);

endif

    }` 

The results of 135 points being calculated is still 20.19ms in Deep profile. But in not deep profile, its doing around 1.5ms, so that's great!

Any more tips on how to make it even better with the code above?

huwb commented 5 years ago

Case closed then :). I guess i should have noticed it was deep profiling. That adds loads of overhead.

The code you posted above looks fine to me. The 10f, 20f etc are all hacked and should be the actual size of the area you're sampling within. This is next on my list to fix on that branch.