mellinoe / ShaderGen

Proof-of-concept library for generating HLSL, GLSL, and Metal shader code from C#,
MIT License
497 stars 56 forks source link

Implemented Math functions, automated headless graphics device testing, numerous bug fixes and improvements #77

Closed thargy closed 6 years ago

thargy commented 6 years ago

Fixes #69, #70, #71, and adds support for System.MathF #65 (except some overloads of Round and IEEERemainder) however, testing not currently possible as MathF is not included in the current build framework. Added same methods to ShaderBuiltins making them readily available in all projects, and these are tested (except on Metal which I don't have access to).

Added support for true end->end testing, including compiling and running compute shaders using Veldrid. This can be seen in action in ShaderBuiltinsTests.cs, which is currently disabled due to inconsistencies on some backends, to re-enable set the SkipReason variable to null at the top of the file, for full breakdown of CPU/GPU failures by method.

Added new FactAttributes to auto-ignore tests that are not supported on the current build environment.

Enhanced TypeSizeCache now auto-detects non-blittable structs, is faster, and removes several infinite-recursion traps.

Began support for constructors & instance methods (#72) by allowing constructor bodies to be passed around as well as method bodies.

TODO: mod/fmod need researching more (see comments). Trig functions, particularly hyperbolic ones and inverse one, are woefully inconsistent between GPU/CPU due, in large part, to rounding issues.

Although, I've currently turned off the end->end tests, that is to allow the code to be tested on Metal backends, etc., before forcing onto the CI server. I believe this is now of sufficient quality to merge in as it is largely correct and ready for review, and adds a lot of enhancements. It was necessary to bundle quite a lot together to get end->end working and further fixes/enhancements shouldbe more granular.

mellinoe commented 6 years ago

@thargy Thanks for the submission. I think the testing infrastructure will be very valuable. I'm pretty busy at the moment, so I can't promise I will be able to take a look at this until the weekend, hopefully at the latest. I'll try to spend some time before then to give some quicker feedback, in case anything major sticks out to me.

thargy commented 6 years ago

Hi @mellinoe, no worries, I'm going to busy for a while with work now. I've been desperately trying to get the CI server to build reliably, but have had issues (as you can see). It seems as though the CI server XUnit runner does not like the Fact extensions I've added in and it fails after a while, it isn't something that happens locally, and it remarkably inconsistent.

I've noted that only Direct3D can create a headless graphics device on the CI Server, but that does work and produces useful test output.

mellinoe commented 6 years ago

@thargy is this ready to review? I see that you're still adding commits to it. Also, I would really like to avoid putting breaking changes directly into this PR. Can you revert the part that splits ShaderBuiltins into two classes? It's going to break a number of folks, and I'd like to do it as a change of its own so that we can communicate it and give time for people to respond.

thargy commented 6 years ago

@mellinoe My bad, I was trying to get the auto-generated Builtin testing working. I had it so that it was auto-generating the C# and compiling to the various shader backends, but executing the auto-generated C# itself was much harder now that AppDomains have been dumped in .Net Standard and AssemblyLoadContext is pretty basic. I thought I'd get it all done in time. I've now rolled that all out into it's own branch, and put back in the various bug fixes. This has allowed me to get it into a version that can now be merged safely.

I've had lots of problems with AppVeyor aborting tests without error messages. Ultimately, I think these were mostly due to any attempt to create a headless device (even when catching exceptions). I've finally resorted to skipping tests that rely on a graphics device from running on the CI server. This is done by not exposing the HeadlessGraphicsDevice or WindowedGraphicsDevice ToolFeature on any ToolChain. I have, at various times, managed to get Direct3D headless graphics device working on AppVeyor though, but for now it's probably best left disabled. I have, however, abandoned the custom Fact & Theory attributes and instead used a Skipping Exception, this allows for tests to be skipped after running as AppVeyor appeared to crash whilst using the custom attributes (that might now be fixed by not attempting to create graphics devices though). The new system is just as effective though.

This pull request does have most MathF functions working, and most are relatively consistent between GPUs. There are numerous bug fixes and enhancements that I needed to add to get to that point, so sorry if it's quite a few changes.

As alluded to above, the new ToolChain system exposes a ToolFeature flag enum that makes it easy to find and use a backend that supports the various features requested on the current environment. It is similarly easy to Require certain features for a test, and it will be auto-skipped of those features are not available (rather than failing or passing). It is also easily extensible as we add more 'features'. For now the WindowedGraphicsDevice feature is not implemented on any of the backends, but it will be easy to add in future.

Anyway, it's now ready for review.

thargy commented 6 years ago

DeepCompareObjectFields seems really complicated -- can you help me understand why it's necessary? Is it possible to just write something more direct (e.g. no reflection) since we control the test data structure(s)?

The ultimate plan is to auto-generate the tests, there's already ~150 overloads and manually adding them is prone to error. I've already got auto-generated code in a different branch and it compiles the structures. In that world we don't have access to the underlying structure directly and I'm replacing the logic with a byte map.

In this version, there's many float fields and the code to manually compare would be large (I did do it this way initially and the results were much less useful). The Deep compare code is a relatively simple recursive descent that has the benefit of only showing what is actually different. This is invaluable when comparing two matrices for example, where 15 of 16 fields are identical.

Now that the tests are running, it's become clear that I ideally need to compare the CPU with several shader implementations at the same time, to see where shaders match and the issue is with CPU code. That is one of the other changes I was working on in the now seperated branch For now, I'll make the tweaks you've asked for. The main benefits in this branch include the mostly working ('good enough for now') Maths functions and the ToolChain + ToolFeature system that makes implementing new tests and skipping automatically much easier.

thargy commented 6 years ago

(Nitpick): I'd prefer to put braces around all if/else blocks, even if they are single statements. Not a big deal, and I didn't comment in the places I saw it.

I've added the start of a R# settings file to override R# settings for anyone (like me) that uses it extensively. That allowed me to run it across the solution and auto-add back in all the braces. It's a harmless file for anyone who doesn't use R#. If you spot any other code-style inconsistencies, let me know and I'll update the solution and re-run a cleanup.

I've added as a single commit.

thargy commented 6 years ago

Implemented all requested changes (except using DisposeCollectorResourceFactory - see above). All checks are passing

mellinoe commented 6 years ago

I'm running the tests on macOS, and a few of them fail. On line 581 of Toolchain.cs, you're attempting to read the output file, which might not exist if the compilation fails. That causes the method to throw an exception which masks the real reason the compilation failed. A couple of the tests are failing because an ambiguous call is being emitted. For example, it's emitting:

float c = pow(2, 10);

which needs to be

float c = pow(2.f, 10.f);

The same seems to be happening for atan.

mellinoe commented 6 years ago

See this for how it was handled before.

thargy commented 6 years ago

I'm running the tests on macOS, and a few of them fail. On line 581 of Toolchain.cs, you're attempting to read the output file, which might not exist if the compilation fails. That causes the method to throw an exception which masks the real reason the compilation failed.

Improved output handling in de3c48745a053c8553decd7b810d9730dd3d9ae2

thargy commented 6 years ago

See this for how it was handled before.

Should this be handled more generically, e.g. in InvocationParameterInfo.GetInvocationParameterList should we always output a type cast for floats if the parameter type is a float?

For example changing this line to:

            return string.Join(", ",
                parameterInfos.Select(
                    pi => pi.FullTypeName == "float" ? $"float({pi.Identifier})" : pi.Identifier));

We could place explicit casts for the other types too? This change does pass all tests on HLSL & GL.

To correct the old way, as well as adding support for the new ShaderBuiltins versions of those methods, I'd need a complete list of which ones are failing as I can't run the Metal ToolChain; so I'm nervous implementing the fixes myself.

mellinoe commented 6 years ago

@thargy There's a lot of random "fixups" in the code already that work around various type inconsistencies, especially in the GLSL ES 300 backend. It should definitely be done in a more centralized place, but I'm not sure that's the place to do it. (It might be, just not sure).

I pulled your latest branch and now I get better error messages with the shader code that fails to compile (which is really, really nice, btw).

Since there's only two places that need float parameters, I'd say we just fix them up locally.

mellinoe commented 6 years ago

By the way, the test output tends to be a bit noisy with messages about skipped tests. Is there a way that we can force the "skip" messages to get combined? On macOS, 80+ tests are skipped, so the output is really noisy, but all of the messages say the same thing. I'm not sure if we can do anything within Xunit's constraints.

thargy commented 6 years ago
  • pow needs float parameters
  • atan needs float parameters Since there's only two places that need float parameters, I'd say we just fix them up locally.

These are two methods that take a second parameter, and from the original code you linked the cast is only applied to this second parameter. The metal specification says these method can take multiple input types, so I guess the issue was an ambiguity was created if the second parameter was not obviously a float (Vectors would not be ambiguous) and therefore was not gauranteed to match the first parameters type.

The existing code checks if the first param is a float and then ensures the second parameter is unambiguously a float. It also checks if the typeName is System.Mathf, but this seems redundant as (to my knowledge) there is only one MathF.Pow overload and it already requires float. I can't use the same logic as we also support Pow and Atan in the builtins now.

To add complexity, in the built in methods, we have one Atan method with overloads that accept 1 or 2 parameters (OpenGL style), whereas MathF has Atan and Atan2 to discriminate between them (HLSL/Metal style). Also, according to the spec, unlike OpenGL, Metal follows the HLSL and MathF approach of having two seperate methods rather than an overload, so there was another bug which I've hopefully rectified as it was mapping the two parameter version to Atan (like OpenGL).

You can see the changes in 6be9b0e7b891d1d93a867109b727ed8d9ada5910, however, I can't test as I don't have access to Metal.

thargy commented 6 years ago

By the way, the test output tends to be a bit noisy with messages about skipped tests. Is there a way that we can force the "skip" messages to get combined? On macOS, 80+ tests are skipped, so the output is really noisy, but all of the messages say the same thing. I'm not sure if we can do anything within Xunit's constraints.

The Attribute approach had the big advantage of skipping entire theories with a single message rather than skipping each test of the theory. That massively reduced skip messages. To be honest, I always use a GUI test runner so I never noticed any particular verbosity. I could go back and retry the attribute code to see if the CI server check (see 33241310d3539a77f2971e6eaf214640e03b9b64) is sufficient to prevent the CI server from crashing now?

mellinoe commented 6 years ago

The latest version is getting close on macOS. There's still a few problems:

thargy commented 6 years ago

Try to fix the parameter type tracking so it recognizes that the parameters should be floats

Pretty sure that is ultimately the only real solution, as it will be necessary if you ever hope to consistently resolve OpenGLES' strict type checking. Further, there's an endless set of edge cases, any type that can be implicitly cast to a float will compile in the C#, but would fail a bulk type check. Looking to see what the method accepts is a far more robust solution.

Metal compilation is actually a two-step process. The shader code is compiled to bitcode using the metal tool, but then it needs to be assembled into a "metallib" file using the metallib tool. Currently, the compute shader end-to-end test only does the first step, which can't be loaded by Veldrid.

The ToolChain system can be extended readily, (for example, an early iteration seperated out Validation from Compilation). However, in this scenario it's probably considering these two steps as 'Compilation', rather than adding a new feature that wouldn't be used by any of the other tool chains. The ability to call an executable and grab the results is implemented in the private Execute method (see here). It is easy to evaluate the CompileResult result of the first step and pass it onto a second step by modifying the MetalCompile method here. However, not having access to a Mac or Metal this isn't something I could implement from my end.

macOS's version of OpenGL does not support compute shaders. This means we can't run the end-to-end test there. The Toolchain constructor should be augmented to check for GraphicsDevice.Features.ComputeShader, and only set "CreateHeadless" if that feature is present. Or, a different flag should be added that represents it. Up to you.

As far as I can see you can only extract the GraphicsDeviceFeatures from a GraphicsDevice instance, so one needs to be created before it can be checked. There are currently 14 flags, though I can see this extending in future. Therefore, it doesn't feel like it would logically fit in ToolChain. Instead, wherever certain graphics device features are required, they should probably be checked once the graphics device is created, and then an exception thrown if missing (with the exception type being set in the SkippableFactAttribute to cause the test to be marked as 'skipped' rather than 'failed'). I assume there are other reasons for using a headless graphics device than just compute shaders?

thargy commented 6 years ago

wherever certain graphics device features are required, they should probably be checked once the graphics device is created, and then an exception thrown if missing (with the exception type being set in the SkippableFactAttribute to cause the test to be marked as 'skipped' rather than 'failed'). I assume there are other reasons for using a headless graphics device than just compute shaders?

I've shown this is action in a9301836907531f37d43a687a2f5b1e612438cab.

mellinoe commented 6 years ago

As far as I can see you can only extract the GraphicsDeviceFeatures from a GraphicsDevice instance, so one needs to be created before it can be checked.

The Toolchain constructor already creates a GraphicsDevice instance to ensure the call will actually work (on this line), so I figured the check could just be done there. Checking inside of the method is also fine, but I thought this fit well into the "capability flags" concept. I don't have a strong opinion one way or the other)

I assume there are other reasons for using a headless graphics device than just compute shaders?

There's no reason to use headless GraphicsDevice instances except it avoids allocating the resources for an OS window. There's no difference from a regular device except a headless GraphicsDevice has no window, and it therefore has no MainSwapchain. You can use compute shaders in any GraphicsDevice.

mellinoe commented 6 years ago

I fixed the two-stage compilation problem in Metal, and pushed the commit here: https://github.com/mellinoe/ShaderGen/commit/fa445d43069f831021bdf06979bbc6a26b6918b5

Please feel free to cherry-pick that commit into your branch.

Pretty sure that is ultimately the only real solution, as it will be necessary if you ever hope to consistently resolve OpenGLES' strict type checking. Further, there's an endless set of edge cases, any type that can be implicitly cast to a float will compile in the C#, but would fail a bulk type check. Looking to see what the method accepts is a far more robust solution.

I agree. There's a few different cases that need to be handled, though, so it wasn't as simple as I was hoping. For method invocations, it seems straightforward -- we can get all of the parameter types from IMethodSymbol.Parameters, and we already have an IMethodSymbol. There are other cases that the invocation translation code handles, though (ctors, properties, etc.), and we need to handle those, too.

thargy commented 6 years ago

The Toolchain constructor already creates a GraphicsDevice instance to ensure the call will actually work (on this line),

I considered that but decided against for the following reasons:

Ultimately, I concluded that conflating the Graphics Device features with the ToolChain features is probably not a good idea. It is possible, should it prove necessary, to add skipping attributes in future that support checking for both types of features. I'm also a little surprised you chose a class of bools rather than a Flag enum, unless you plan for some features to be more descriptive in future? All that uncertainty made me nervous about just dumping it in there.

There's no difference from a regular device except a headless GraphicsDevice has no window, and it therefore has no MainSwapchain. You can use compute shaders in any GraphicsDevice.

Sorry, I was probably very unclear, my point was to question whether it was reasonable to say "because you don't support compute shaders you can't create headless graphics devices"? I didn't think that a was a valid stance to take? I was also under the impression you can use headless graphics devices for more than running compute shaders, like I thought you could still run a vertex/fragment shader and render to a texture, which could be useful for testing such shaders, even potentially in a CI environment?

thargy commented 6 years ago

Please feel free to cherry-pick that commit into your branch.

Done, though I've propogated the transpiled source into the second step so that it is returned in the output CompileResult and not lost during Lib creation. That way it remains consistent with the other tool chains.

I agree. There's a few different cases that need to be handled, though, so it wasn't as simple as I was hoping. For method invocations, it seems straightforward -- we can get all of the parameter types from IMethodSymbol.Parameters, and we already have an IMethodSymbol. There are other cases that the invocation translation code handles, though (ctors, properties, etc.), and we need to handle those, too.

As this would be a fairly major piece of work, is it worth concluding this pull-request and doing that as a seperate piece of work, only this pull-request already introduces a lot of changes to a lot of files? As not all of the new CPU-friendly methods are fully working yet it isn't really making the situation worse? To be honest, the current tree walks need reviewing as there's quite a bit of duplication and there are quite a few cases where concepts should be handled in a more encapsulated way (consider #62, #71, etc.)

Ideally, I'm keen to consider #78 again, so I can improve the CPU-GPU tests to being automatic and change them to compare across multiple GPUs as well as the CPU in one go, this will make achieving consistency more realistic. I understand pulling that from this branch, but most of the methods that are being moved aren't in the existing master branch anyway, and, although a breaking change, it is easy for downstream authors to refactor by including the new class where necessary.

mellinoe commented 6 years ago

As this would be a fairly major piece of work, is it worth concluding this pull-request and doing that as a seperate piece of work, only this pull-request already introduces a lot of changes to a lot of files?

That's fine, we just need to fix the current behavior on Metal, since it's not handling pow and atan correctly. If you'd like, then I can just make that change myself with a TODO marker.

thargy commented 6 years ago

That's fine, we just need to fix the current behavior on Metal, since it's not handling pow and atan correctly. If you'd like, then I can just make that change myself with a TODO marker.

If you could that would be helpful, as I'm not sure how best to fix it temporarily? The new ShaderBuiltins versions of Atan and Pow accept more overloads than System.MathF does.

I don't understand how the bug didn't exist for the original ShaderBuiltins.Pow, as that relied on purely checking the type of the second parameter - which as you suggest isn't good enough. System.MathF.Pow was OK, because it knew that method always required a float.

I'm happy to cherrypick a commit again? Obviously, I don't see the failure without metal.

mellinoe commented 6 years ago

I pushed another commit to my branch which gets me to green on macOS. If you cherry-pick that, I'll go ahead and run all my tests again and merge when green.

I ended up just always emitting the float fixup if it's a non-Vector overload (since that was easy to check). Hacky, but quick.

thargy commented 6 years ago

I ended up just always emitting the float fixup if it's a non-Vector overload. Hacky, but quick.

To be fair, that's probably the safest bet for now! I've cherry-picked it.

mellinoe commented 6 years ago

@thargy Everything looks good to me now. Would you have any problem if I squashed this when merging? There's quite a few "WIP" and intermediate state commits that I'd like to squash out, if you're not opposed.

thargy commented 6 years ago

@mellinoe knock yourself out! I should have done that already, I had a few commits when moving from work to home and vice versa.

mellinoe commented 6 years ago

@thargy Thanks for all the time and effort you dedicated to this. The improvements are well worth it!

thargy commented 6 years ago

@mellinoe thanks for the merge! Next step is to improve the consistency between the various backends and the cpu implementations.