microsoft / angle

ANGLE: OpenGL ES to DirectX translation
Other
615 stars 166 forks source link

latest hololens branch still doesn't use non-geometry fastpath #92

Open mlfarrell opened 8 years ago

mlfarrell commented 8 years ago

Just pulled latest code, its still going into non-fastpath using GS which is very slow compared to true VPRT vertex+pixel only path.

Code even asserts so: in programd3d.cpp

#ifdef ANGLE_ENABLE_WINDOWS_HOLOGRAPHIC
    // rendering holographically using instancing
    // uses a pass-through geometry shader
    if (rx::HolographicNativeWindow::IsInitialized())
    {
        // rendering holographically using instancing
        // TODO: create a shader EXE for each type of GL_geometry, pick one depending on the state at draw time
        getGeometryExecutableForPrimitiveType(data, GL_TRIANGLES, &pointGS, &infoLog);
        ASSERT(pointGS);

        // Geometry shaders are currently only used internally, so there is no corresponding shader
        // object at the interface level. For now the geometry shader debug info is prepended to
        // the vertex shader.
        vertexShaderD3D->appendDebugInfo("// GEOMETRY SHADER BEGIN\n\n");
        vertexShaderD3D->appendDebugInfo(pointGS->getDebugInfo());
        vertexShaderD3D->appendDebugInfo("\nGEOMETRY SHADER END\n\n\n");
    }
#endif

I verified that it is indeed setting a GS for my draw calls.

@MikeRiches

mlfarrell commented 8 years ago

ss

Somewhat related, the ANGLE rendering path seems to waste a TON of time on both CPU overhead and GPU idling. Not sure what we can do about this but it's crippling my gfx engine since I'm missing the VSYNC almost every frame. See attached above

mlfarrell commented 8 years ago

For those curious, what turned out to be the absolute biggest bottleneck in performance (via profiling) is the D3D11 Map() calls used to update uniforms via the one constant buffer, this absolutely crawls on HoloLens hardware.

MikeRiches commented 8 years ago

Thanks for letting us know. @austinkinross, any ideas on how we might speed this up for HoloLens?

mlfarrell commented 8 years ago

One idea may be to use D3D11_MAP_WRITE_NO_OVERWRITE, but honestly even when I shaved my uniforms down to only 5-10 components, I still see a huge draw call overhead (centered around the maps). HoloLens seems to hate any form of transfer between CPU/GPU memory.

PS - in case you're wondering, here's some of the amazing demos your hard work on the ms-holo branch has enabled. I only had to cheat and drop into D3D for the spatial meshes rendering.

https://www.youtube.com/watch?v=h_spfbvNwmk

MikeRiches commented 8 years ago

While creating the depth-based image stabilization component, I noticed that mappable default buffers work on the HoloLens GPU. If we aren't already using those, they might offer us a speed boost by avoiding an extra on-GPU memory copy. Then again, I don't know why we aren't using the UpdateSubresource method here, which (as I understand it) can be faster by updating only what has changed.

Thanks for the link, this is great to see! Hope you don't mind - I've shared it with a few other folks here at Microsoft.

mlfarrell commented 8 years ago

Please do! Check back on that channel from time to time, I hope to update it with more impressive demos.

As for update sub resource, I tried doing that but got screwed in two places 1) You cannot partial update a constant buffer, its all or nothing 2) you cannot call update sub resource on DYNAMIC buffers