o3de / sig-graphics-audio

Documents and communications for the O3DE Graphics-Audio Special Interest Group
12 stars 14 forks source link

Proposed RFC Feature =GTAO (Ground-Truth based Ambient Occlusion)= #124

Closed yangfei103 closed 1 year ago

yangfei103 commented 1 year ago

Proposed RFC Feature GTAO (Ground-Truth based Ambient Occlusion)

Summary:

Ambient occlusion (AO) is an important feature in photo-realistic rendering. However, O3DE only provides the SSAO (Screen Space Ambient Occlusion) feature which can not meet all our needs. To make this gap, we developed GTAO (Ground-Truth based Ambient Occlusion) feature for O3DE. The GTAO algorithm, which is first proposed by Activision Blizzard in Siggraph 2016 (FYI, one can refer to this paper and this slides for more details about the algorithm.), can be seen as an enhanced version of SSAO. In a word, the GTAO can achieve better quality with comparable performance cost to the SSAO. Our GTAO implementation is not a full version of Activision’s version, for example, we didn’t support colored occlusion and temporal denoising for now. Despite that, it works well according to our test. Now, we open an RFC feature request here and intend to commit our GTAO implementation to the O3DE community.

What is the relevance of this feature?

In photo-realistic/physically-based rendering, a mesh point is shaded by calculating direct and indirect illumination with diffuse or/and glossy BRDFs according to the rendering equation. The diffuse indirect illumination we talk about here is usually known as ambient light. Ambient occlusion can be interpreted as the visibility for the ambient light, which is an essential feature to improve the realism of rendered images.

In real-time rendering, accurate ambient occlusion is impractical to calculate. For instance, the SSAO is a cheap but coarse approximation to accurate ambient occlusion. Thus it can not always meet the users' needs. To enhance the capability of O3DE in AO, more advanced features are necessary.

As one of the candidates, though GTAO is also a kind of approximation, it is theoretically closer to the Monte Carlo ground truth in algorithm designing. And compared to the SSAO, the GTAO can achieve better quality with a comparable performance cost. By integrating the GTAO into O3DE, users can make a choice between the SSAO and the GTAO according to their needs.

Feature design description:

Architecture

The GTAO is integrated into Atom Gem and AtomLyIntegration Gem following the current post-process pipeline. Based on the SSAO component that O3DE already has, a new AO component that contains both SSAO and GTAO is developed. In overview, the architecture of the new AO component is as follows: AO-architecture

Component Panel

The user can specify an AO type in the Ambient Occlusion panel to enable one of SSAO and GTAO.

SSAO: image-20230227143410906

GTAO: image-20230227143347518

Parameters Description
AO type Can be selected in SSAOand GTAO in this drop box.
GTAO Strength Control the amount of ambient occlusion. The range from 0 to 1, the bigger the occlusion.
Quality Quality level of GTAO, 5 levels to adjust. It actually determines the number of sample directions in the GTAO algorithm. The more samples the better quality.
Radius Sample sphere radius, the bigger the more obvious occlusion effect.
Thickness Thickness parameter on object edges, the bigger the less occlusion on the thin object area.
MaxDepth Maximum depth for computing AO. Depth of objects exceeding this value stop computing AO to prevent distortion at a far distance.
Enable Blur Enable blur to denoise the output AO map.
Blur Strength Blur parameter
Blur Edge Threshold Blur parameter
Blur Sharpness Blur parameter

Usage

The AOParentPass will control his children, enable the corresponding pass and disable the others, according to the AO Type from AOSettings. For instance, the user adds an Ambient Occlusion component to the scene. Then he/her selectsGTAO from the AO Type drop box. This attribute will be first saved in the AOComponentConfig, and further passed to AOSettings by AOComponentController. In runtime, the AOParentPas gets AOSettings from the PostProcessingFeatureProcessor. The GTAOPass will be enabled because the AO Type matches GTAO. Correspondingly, the SSAOPass will be disabled automatically.

Technical design description:

Recap: HBAO & GTAO

Baseline

Before giving implementation details, we make a brief introduction to the GTAO algorithm. The GTAO is first proposed by Activision Blizzard in Siggraph 2016. GTAO is highly related to HBAO (Horizon-based Ambient Occlusion) in algorithm. HBAO assumes a height field around the shading point, in which case the visibility is continuous on the hemisphere (simplify computing). Then they calculate integration on the hemisphere between two horizon lines, which are determined by tracing the height field (depth buffer) in screen space. But its integration equation doesn't correctly match $k_A$ which is deduced from the render equation and ensured to be physically correct. Thus, HBAO cannot promise the same results as the Monte Carlo based ray traced results in theory. To implement ground-truth based ambient occlusion, the GTAO integrates cosine weight term to their integration equation : $$Vd=\frac{1}{\pi}\int\Omega V(\omega_i)(n\cdot\omega_i)\,\mathrm{d}\omega_i=\frac{1}{\pi}\into^\pi\int{-\pi/2}^{\pi/2}V(\theta, \phi)(n\cdot\omegai)|\sin(\theta)|\mathrm{d}\theta\mathrm{d}\phi$$ where the inner integration $$\int{-\pi/2}^{\pi/2}V(\theta, \phi)(n\cdot\omega_i)|\sin(\theta)|\mathrm{d}\theta=IntergrateArc(h_1, h_2, n)$$ can be solved analytically.

image-20230227192552869

Horizon lines $h_1$ and $h_2$ can be found by searching the height field (depth buffer) in screen space.

image-20230227194134493

Similar to the HBAO, the inner integration is solved analytically, and the outer one can be numerically solved by sampling a number of directions around the shading pixel in screen space.

Multi-bounce approximation

To approximate multi-bounce reflection, GTAO models the multi-bounced visibility $V_d^\prime$ as a function of the single-bounced visibility $V_d$ and (neighboring) albedo $\rho$. $$V_d^\prime=f(V_d,\rho)$$ There exists an assumption that neighboring albedo can be approximated with the albedo of the current point being shaded. Then they fit $f$ from data of Monte Carlo ray-traced results with a cubic polynomial function under various albedo: $$V_d^\prime=f(V_d)=((aV_d+b)V_d+c)V_d$$

Implementation details

We implement GTAO following the architecture described previously in O3DE. Atom and AtomLyIntegration Gems are involved. The code tree including both SSAO and GTAO is as follows:

Code tree

// Atom Gem
Atom
|--Feature
    |--Common
        |--Assets
            |--Passes
                |--AOparent.pass // newly added
                |--GTAOParent.pass // newly added
                |--GTAOCompute.pass // newly added
                |——SsaoParent.pass
                |--SsaoCompute.pass
            |--Shaders
                |--PostProcessing
                    |--SsaoCompute.shader
                    |--SsaoCompute.azsl
                    |--GTAOCompute.shader // newly added
                    |--GTAOCompute.azsl // newly added
        |--Code
            |--Include
                |--Atom
                    |--Feature
                        |--PostProcess
                            |--AmbientOcclusion // renamed
                                |--SsaoConstants.h
                                |--SsaoParams.inl
                                |--GTAOConstants.h // newly added
                                |--GTAOParams.inl // newly added
                                |--AOSettingsInterface.h // modified
            |--Source
                |--PostProcess
                    |--AmbientOcclusion // renamed
                        |--AOSettings.h // modified
                        |--AOSettings.cpp // modified
                |——PostProcessing
                    |--SsaoPasses.h
                    |--SsaoPasses.cpp
                    |--GTAOPasses.h // newly added
                    |--GTAOPasses.cpp // newly added
// AtomLyIntegration Gem
AtomLyIntegration
    |--CommonFeatures
        |--Code
            |--Include
                |--AtomLyIntegration
                    |--CommonFeatures
                        |--PostProcess
                            |--AmbientOcclusion // renamed
                                |--AOBus.h // modified
                                |--AOComponentConfig.h // modified
            |--Source
                |--PostProcess
                    |--AmbientOcclustion // renamed
                        |--AOComponent.h // modified
                        |——AOComponent.cpp // modified
                        |——AOEditorComponent.h // modified
                        |——AOEditorComponent.cpp // modified
                        |——AOComponentConfig.cpp // modified
                        |——AOComponentController.h // modified
                        |——AOComponentController.cpp // modified

Note that, for brief reasons, only core codes are listed.

The component is designed as same as SSAO. AOParent.pass, GTAOParent.pass,GTAOCompute.pass, GTAOCompute.shader, GTAOCompute.azsl, GTAOConstant.h, GTAOParams.inl, GTAOPasses.h and GTAOPasses.cpp are newly added file. In addition, SsaoXXX files are renamed to AOXXX respectively.

Pass design

{
    "Type": "JsonSerialization",
    "Version": 1,
    "ClassName": "PassAsset",
    "ClassData": {
        "PassTemplate": {
            "Name": "GTAOParentTemplate",
            "PassClass": "GTAOParentPass",
            "Slots": [
                {
                    "Name": "LinearDepth",
                    "SlotType": "Input",
                    "ScopeAttachmentUsage": "Shader"
                },
                {
                    "Name": "Modulate",
                    "SlotType": "InputOutput",
                    "ScopeAttachmentUsage": "Shader"
                }
            ],
            "PassRequests": [
                // downsample pass
                {
                    "Name": "DepthDownsample",
                    "TemplateName": "DepthDownsampleTemplate",
                    "Enabled": true,
                    "Connections": [
                        {
                            "LocalSlot": "FullResDepth",
                            "AttachmentRef": {
                                "Pass": "Parent",
                                "Attachment": "LinearDepth"
                            }
                        }
                    ]
                },
                // compute pass
                {
                    "Name": "GTAOCompute",
                    "TemplateName": "GTAOComputeTemplate",
                    "Connections": [
                        {
                            "LocalSlot": "LinearDepth",
                            "AttachmentRef": {
                                "Pass": "DepthDownsample",
                                "Attachment": "HalfResDepth"
                            }
                        }
                    ]
                },
                // blur pass
                {
                    "Name": "GTAOBlur",
                    "TemplateName": "FastDepthAwareBlurTemplate",
                    "Enabled": true,
                    "Connections": [
                        {
                            "LocalSlot": "LinearDepth",
                            "AttachmentRef": {
                                "Pass": "DepthDownsample",
                                "Attachment": "HalfResDepth"
                            }
                        },
                        {
                            "LocalSlot": "BlurSource",
                            "AttachmentRef": {
                                "Pass": "GTAOCompute",
                                "Attachment": "Output"
                            }
                        }
                    ]
                },
                // upsample pass
                {
                    "Name": "Upsample",
                    "TemplateName": "DepthUpsampleTemplate",
                    "Enabled": true,
                    "Connections": [
                        {
                            "LocalSlot": "FullResDepth",
                            "AttachmentRef": {
                                "Pass": "Parent",
                                "Attachment": "LinearDepth"
                            }
                        },
                        {
                            "LocalSlot": "HalfResDepth",
                            "AttachmentRef": {
                                "Pass": "DepthDownsample",
                                "Attachment": "HalfResDepth"
                            }
                        },
                        {
                            "LocalSlot": "HalfResSource",
                            "AttachmentRef": {
                                "Pass": "GTAOBlur",
                                "Attachment": "Output"
                            }
                        }
                    ]
                },
                // modulate pass
                {
                    "Name": "ModulateWithGTAO",
                    "TemplateName": "ModulateTextureTemplate",
                    "Enabled": true,
                    "Connections": [
                        {
                            "LocalSlot": "Input",
                            "AttachmentRef": {
                                "Pass": "Upsample",
                                "Attachment": "Output"
                            }
                        },
                        {
                            "LocalSlot": "InputOutput",
                            "AttachmentRef": {
                                "Pass": "Parent",
                                "Attachment": "Modulate"
                            }
                        }
                    ],
                    "PassData": {
                        "$type": "ComputePassData",
                        "ShaderAsset": {
                            "FilePath": "Shaders/PostProcessing/ModulateTexture.shader"
                        },
                        "Make Fullscreen Pass": true
                    }
                }
            ]
        }
    }
}

Similar to the SSAO pass, the pass for GTAO is: downsample->compute->bluring->upsample->modulate.

Shader snippets

// compute the inner integration given azimuth.
float ComputeInnerIntegral(float2 Angles, float2 ScreenDir, float3 ViewDir, float3 ViewSpaceNormal)
{
    float PI_HALF = 3.1415926535 / 2;
    // Given the angles found in the search plane we need to project the View Space Normal onto the plane defined by the search axis and the View Direction and perform the inner integrate
    float3 PlaneNormal = normalize(cross(float3(ScreenDir.xy,0) ,ViewDir));
    float3 Perp = cross(ViewDir, PlaneNormal);
    float3 ProjNormal = ViewSpaceNormal - PlaneNormal * dot(ViewSpaceNormal, PlaneNormal);

    float LenProjNormal = length(ProjNormal) + 0.000001f;
    float RecipMag = 1.0f / (LenProjNormal);

    float CosAng = dot(ProjNormal, Perp) * RecipMag;    
    float Gamma = acosFast(CosAng) - PI_HALF;                
    float CosGamma = dot(ProjNormal, ViewDir) * RecipMag;
    float SinGamma = CosAng * -2.0f;                    

    // clamp to normal hemisphere 
    Angles.x = Gamma + max(-Angles.x - Gamma, -(PI_HALF) );
    Angles.y = Gamma + min( Angles.y - Gamma,  (PI_HALF) );

    float AO = ( (LenProjNormal) *  0.25 * 
                        ( (Angles.x * SinGamma + CosGamma - cos((2.0 * Angles.x) - Gamma)) +
                            (Angles.y * SinGamma + CosGamma - cos((2.0 * Angles.y) - Gamma)) ));
    return AO;
}
// search for horizon lines in screen space
float2 SearchForLargestAngleDual(float2 BaseUV, float2 ScreenDir, float pixelRadius, float InitialOffset, float3 ViewPos, float3 ViewDir,float AttenFactor)
{
    float SceneDepth, LenSq, OOLen, Ang, FallOff;
    float3 V;
    float2 SceneDepths = 0;

    float2 BestAng = float2(-1,-1);
    float Thickness = PassSrg::m_constantsGTAO.m_enabledThincknessAttenFactorMaxDepth.y;

    for(uint i = 0; i < GTAO_NUMTAPS; i++)
    {
        float fi = (float) i;
        float s = (fi + InitialOffset) / (GTAO_NUMTAPS + 1);
        s = s * s;
        float2 sampleOffset = ScreenDir * max(pixelRadius * s, (fi+1));
        float2 UVOffset = GetPixelSize() * sampleOffset;

        UVOffset.y *= -1;
        float4 UV2 = BaseUV.xyxy + float4( UVOffset.xy, -UVOffset.xy);

        // Positive Direction
        SceneDepths.x = PassSrg::m_linearDepth.SampleLevel(PassSrg::PointSampler, UV2.xy, 0).r;
        SceneDepths.y = PassSrg::m_linearDepth.SampleLevel(PassSrg::PointSampler, UV2.zw, 0).r;

        V = ViewSrg::GetViewSpacePosition(UV2.xy, SceneDepths.x) - ViewPos;
        LenSq = dot(V,V);
        OOLen = rsqrt(LenSq + 0.0001);
        Ang = dot(V,ViewDir) * OOLen;

        FallOff = saturate(LenSq * AttenFactor);  
        Ang = lerp(Ang, BestAng.x, FallOff);

        BestAng.x = ( Ang > BestAng.x ) ? Ang : lerp( Ang, BestAng.x, Thickness );  

        // Negative Direction
        V = ViewSrg::GetViewSpacePosition(UV2.zw, SceneDepths.y) - ViewPos;
        LenSq = dot(V,V);
        OOLen = rsqrt(LenSq + 0.0001);
        Ang = dot(V,ViewDir) * OOLen;

        FallOff = saturate(LenSq * AttenFactor);  
        Ang = lerp(Ang, BestAng.y, FallOff);

        BestAng.y = ( Ang > BestAng.y ) ? Ang : lerp( Ang, BestAng.y, Thickness );  
    }

    BestAng.x = acosFast(clamp(BestAng.x, -1.0,  1.0));
    BestAng.y = acosFast(clamp(BestAng.y, -1.0,  1.0));

    return BestAng;
}
// multi-bounce approximating 
float MultiBounce(float AO,float3 albedo)
{
    float3 a = 2.0404 * albedo - 0.3324;
    float3 b = -4.7951 * albedo + 0.6417;
    float3 c = 2.7552 * albedo + 0.6903;

    return max(AO, ((AO * a + b) * AO + c) * AO);
}

What are the advantages of the feature?

image-20230228101129927

GTAO time spent 0.41ms in Sponza scene.

O3DE SSAO: image-20230228100429405

O3DE GTAO: image-20230228101014460

What are the disadvantages of the feature?

Problem1: Artifacts in some areas. image-20230228101254426

Problem2: Noise in some areas. image-20230228101320705

How will this be implemented or integrated into the O3DE environment?

Are there any alternatives to this feature?

How will users learn this feature?

Are there any open questions?

invertednormal commented 1 year ago

Looks really nice, definitely +1. This seems superior to the current SSAO in every way - are there any reasons long term to keep the old version around long term? Does GTAO require any features that may not exist on all of o3de's supported hardware? Are there any cases where the current SSAO performs better than GTAO?

invertednormal commented 1 year ago

Also curious how a low quality GTAO compares to current SSAO in performance and quality - for example, does a low quality GTAO still look better than SSAO and is also cheaper?

yangfei103 commented 1 year ago

Looks really nice, definitely +1. This seems superior to the current SSAO in every way - are there any reasons long term to keep the old version around long term? Does GTAO require any features that may not exist on all of o3de's supported hardware? Are there any cases where the current SSAO performs better than GTAO?

Also curious how a low quality GTAO compares to current SSAO in performance and quality - for example, does a low quality GTAO still look better than SSAO and is also cheaper?

Thank you for your comments. There is no special request for GTAO, in fact, it is also a screen space algorithm like SSAO. We have 5 qualities to adjust for GTAO. We let users themselves make a trade-off between performance and quality. At a low-quality level, I think GTAO won't look much better than SSAO in the current version. As far as I know, there exist some obvious noises at a low-quality level. Of course, we hope the community could help us improve it.

As for performance, GATO actually performs slightly more costly than SSAO even at a low-quality level. I show you a simple performance comparison with SSAO and GTAO on my PC, FYI.

Platform:

SSAO: SSAO

GTAO (Low): GTAO-low

Though the current version is not as good as our expectations/goals. We sincerely hope the community could help us to improve it.

antonmic commented 1 year ago

Thanks for the detail write up and all the hard work that went into implementing this! However, the base assumption that GTAO is better than O3DE's SSAO is incorrect. It might be true for old implementations of SSAO, but O3DE uses a custom implementation that is both faster and higher quality than GTAO.

  1. GTAO is based on HBAO, which is an older technique that assumes each pixel has an infinite wall behind it and tries to calculate the visibility angle. This is a poor assumption in practice, and leads to small geometry casting strong AO like the chains holding the light. It also creates a dark halo around objects because they case AO much futher behind them than they should, while simultaneously not casting enough AO in front of the object. For example, a pillar touching the ground should cast AO equally all around the pillar, but HBAO casts much more AO behind the pillar and not enough in front. O3DE's SSAO has none of these drawbacks.
  2. GTAO adds cosine weighting to HBAO for more accurate results. O3DE's SSAO has cosine weighting as well (dot product in the sample accumulation), so both are equal in this regard.
  3. O3DE's SSAO has a better weighting strategy that preserves detail better for more distant objects.
  4. O3DE's SSAO does reveal low-res geometry because it calculates normals from the depth buffer (this is faster than writing out normals and then reading them in). This is the biggest drawback of the current implementation, but this can be mitigated by writing out interpolated vertex normals in the pre-depth pass (better quality but less performant) or using the normals from the G-Buffer (means SSAO would be forced to stay after the forward pass, which would limit flexibility. Having SSAO before forward is more flexible because per-material control can be added to reduce SSAO on softer objects like snow).
  5. Based on the measurements you provide, the low end version of GTAO is almost double the cost of O3DE SSAO (0.108 vs 0.06ms)

Here is comparison using the images you provided: GTAO 2023-03-01_000053

O3DE SSAO 2023-03-01_000054

invertednormal commented 1 year ago

Thanks for the writeup @antonmic. This makes it a lot clearer that there are pros and cons to each approach, and there are areas where the current SSAO is definitely superior - especially with regards to GTAO applying too much occlusion behind objects. It would be nice to see comparisons against a ray-traced ground truth so we could objectively compare the techniques.

yangfei103 commented 1 year ago

Thanks for the detail write up and all the hard work that went into implementing this! However, the base assumption that GTAO is better than O3DE's SSAO is incorrect. It might be true for old implementations of SSAO, but O3DE uses a custom implementation that is both faster and higher quality than GTAO.

  1. GTAO is based on HBAO, which is an older technique that assumes each pixel has an infinite wall behind it and tries to calculate the visibility angle. This is a poor assumption in practice, and leads to small geometry casting strong AO like the chains holding the light. It also creates a dark halo around objects because they case AO much futher behind them than they should, while simultaneously not casting enough AO in front of the object. For example, a pillar touching the ground should cast AO equally all around the pillar, but HBAO casts much more AO behind the pillar and not enough in front. O3DE's SSAO has none of these drawbacks.
  2. GTAO adds cosine weighting to HBAO for more accurate results. O3DE's SSAO has cosine weighting as well (dot product in the sample accumulation), so both are equal in this regard.
  3. O3DE's SSAO has a better weighting strategy that preserves detail better for more distant objects.
  4. O3DE's SSAO does reveal low-res geometry because it calculates normals from the depth buffer (this is faster than writing out normals and then reading them in). This is the biggest drawback of the current implementation, but this can be mitigated by writing out interpolated vertex normals in the pre-depth pass (better quality but less performant) or using the normals from the G-Buffer (means SSAO would be forced to stay after the forward pass, which would limit flexibility. Having SSAO before forward is more flexible because per-material control can be added to reduce SSAO on softer objects like snow).
  5. Based on the measurements you provide, the low end version of GTAO is almost double the cost of O3DE SSAO (0.108 vs 0.06ms)

Here is comparison using the images you provided: GTAO 2023-03-01_000053

O3DE SSAO 2023-03-01_000054

@antonmic Thanks for your detailed comments. Definitely, O3DE‘s SSAO has its own advantages in some cases. I agree with @invertednormal that there is no approach is perfect, especially in real-time rendering. Each approach has its own advantages and disadvantages because an algorithm is always based on some assumptions. If an assumption doesn't meet a scene, then the artifact occurs. In our opinion, we suggest keeping both approaches to let users make a choice. As far as I know, many modern engines have more than one kind of AO solution, for example, Unreal supports both SSAO and RTAO.

As for some drawbacks of the GTAO you mentioned, they are true of course. But I think there exist some techniques to relieve them, though we may didn't make it better yet, as a first version. For example:

Thanks for the writeup @antonmic. This makes it a lot clearer that there are pros and cons to each approach, and there are areas where the current SSAO is definitely superior - especially with regards to GTAO applying too much occlusion behind objects. It would be nice to see comparisons against a ray-traced ground truth so we could objectively compare the techniques.

@invertednormal I agree with that. I could provide more comparisons of GTAO and SSAO (including the O3DE's version and the UE's version). And I would further show the RTAO effects of both engines (we also developed RTAO for O3DE) as a reference, FYI.

BTW, I am not trying and do not intend to show how good the feature we provide is. Actually, there exist some problems in the current version as I mentioned before. All I can do is provide details as possible as I can. So that the community could have a better knowledge of the feature and make an adequate consideration that whether it is potential/proper for the community. If so, I think we could make it better together in the future.

antonmic commented 1 year ago

Can you make your GTAO implementation into a Gem? Then you can easily submit it and iterate on it, potentially get others to collaborate on some of the missing pieces.

Also worth noting, some of the improvements you mentioned for GTAO can also be applied to O3DE's SSAO, like a multibounce curve and temporal super-sampling/accumulation. Depending on what you're aiming for this might give you the best results.

yangfei103 commented 1 year ago

Can you make your GTAO implementation into a Gem? Then you can easily submit it and iterate on it, potentially get others to collaborate on some of the missing pieces.

Of course, it couldn't take much effort to migrate the GTAO into an independent Gem. I think it's a good idea.

galibzon commented 1 year ago

Approved as long as this feature comes from its own Gem, so the users of O3DE can pick from SSAO component or GTAO component.

moudgils commented 1 year ago

Since this RFC is accepted please open a PR and move this RFC to this folder - https://github.com/o3de/sig-graphics-audio/tree/main/rfcs where we will track all the new RFCs for book keeping purposes. Thanks.

yangfei103 commented 1 year ago

Since this RFC is accepted please open a PR and move this RFC to this folder - https://github.com/o3de/sig-graphics-audio/tree/main/rfcs where we will track all the new RFCs for book keeping purposes. Thanks.

Roger that! I'm on my way.