vk2gpu / Engine

Game Engine
Other
125 stars 14 forks source link

Vulkan support #8

Open boberfly opened 7 years ago

boberfly commented 7 years ago

Hey Neil,

I've finally pushed my WIP status of Vulkan support, don't bother testing it won't work I'm still in conversion mode and there's still DX12 code all over the place: https://github.com/boberfly/Engine/tree/vulkan

I'll rebase and work on this (provided I have some time this weekend)...

boberfly commented 7 years ago

I'm now using this which should make things way easier for me to gauge on what converts to what: https://github.com/NVIDIAGameWorks/Falcor

vk2gpu commented 7 years ago

Oh awesome!

If you've got any suggestions for interface changes to better fit Vulkan/improve D3D12 I'd be interested in hearing them! Some in particular I'm probably going to add:

By the way, I've mostly been working on features/graphics just in case you didn't notice. Slowly getting a higher level renderer + supporting code in place there. There's a shader effect system there, it's not complete, but at least provides the beginnings of backend shader generation. I'll be looking at probably using glslang to do the HLSL->SPIR-V conversion.

boberfly commented 7 years ago

Hey Neil,

I had some concerns about mapping the different kinds of buffers and views of DX12 to Vulkan's buffers and images but with TBOs/UBOs/SSBOs and it seems like I was on the right track when comparing to how Falcor does that so that's good (your GPU abstraction is working well here so far!).

For the shader-side Falcor decided on 'slang' which seems to be an HLSL subset which targets both, but I was kind of fond of the idea of using SPIRV-Cross like Oryol does it here (I don't mind using Vulkan GLSL myself as long as it converts well): https://floooh.github.io/2017/05/15/oryol-spirv.html

For transitions and high-level rendering might I suggest looking at this new codebase: https://github.com/Themaister/Granite He's the dev who got Dolphin to run on Vulkan and works on SPIRV-Cross, the render graph concept might be pretty nice but I haven't researched it too much yet.

I'm in agreement with the framebuffer/render pass change, it would make doing those tile-based deferred optimisations on mobile easier!

On a side-note the command buffers/lists I was going to delve into soon, do you cache these CPU-side or use the DX12 API directly? Was thinking about using Floooh's Sokol to do a sneaky port to old APIs and use software command buffers there so they could be dispatched as jobs later to feed into the render thread or something, not a priority for me though...

Cheers

vk2gpu commented 7 years ago

Hello!

So far shader effect compilation works, but I'll need to get GLSL generating from it. I can always take a look at this right away if you'd like. As I'd mentioned, I intended to use glslang to compile HLSL to SPIR-V. I know that this is basically production ready and currently how Ashes Of The Singularity does its Vulkan support. I've had limited success in the past with bytecode/IL conversion, so I'm a bigger fan of source->source transpiling rather than something like SPIRV-Cross.

In case you were wondering why I wrote my own parser (it's not the full shader AST though, function bodies are parsed verbatim), it's because I prefer to work with the entire render state, sampler state, etc all declared within the shader. It wasn't a huge amount of work to do, and meant I didn't have to worry about trying to extend another shader parser to do just what I need, although I may do this some other time. If you look at the shader code it's similar to ye olde D3DXEffect. An example of what a shader effect looks like: https://github.com/neilogd/Engine/blob/features/graphics/res/shader_tests/00-basic.esf

Yeah I've spotted this, haven't yet had much of a dive into the code though. I was probably going to do a render graph/frame graph at the higher level, I'm still shelling out some of the foundation work over on features/graphics but I will get to working out a nice setup for the high level renderer structure.

The command lists are setup to be API agnostic, and processed in bulk. This avoids having to have virtuals per command type and to easily do batch processing. You may notice I've got a separate compile + submit: https://github.com/neilogd/Engine/blob/master/src/gpu/manager.h#L136

CompileCommandList will do all work to build the D3D12. Say I were to do a D3D11 port, this would probably do nothing since there's nothing to compile (unless using deferred contexts, but that's not something I would use anyway for various reasons).

SubmitCommandList just queues up the API specific command list. Right now choice of queue is highly inflexible. I'm yet to decide on how I want to expose compute, graphics, copy to the user at a higher level.

Long term the intent is that compile is able to be ran on any thread, so it's easy to go wide. That may work already, but I haven't done a unit test for it yet. I should probably document the public API & where things are thread safe or require special consideration.

boberfly commented 7 years ago

Cool, looks like you've got the shader stuff sorted and good to hear that the HLSL->SPIR-V is production-worthy. No rush for the GLSL stuff, this Vulkan update I made is basically a commit/push of what I got up to a few months back so that it doesn't get lost, it's far from finished and will take me awhile to finish.

I came across this from Siggraph which is nvFX but for Vulkan which might be inspiring: https://github.com/tlorach/vkFX Not there yet until later in the month or two, there's a video out there about it somewhere...

I'd like to have some fun with the higher-level concepts soon, I've got quite a few experiments to try out once this low-level stuff is rock-solid. By the way how is the memory allocator right now? I've kind of side-stepped that part for the time being but was hoping the one from DX12 is re-usable somehow... ;) This looks interesting though: https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator

vk2gpu commented 7 years ago

At the moment I don't do anything fancy for DX12 memory allocation. I was going to adapt tlsf (https://github.com/mattconte/tlsf) for use with DX12 + Vulkan. I forget who recommended it to me, but I did jump on a thread on twitter a while ago regarding memory allocators and that was mentioned :) Right now the descriptor heaps are allocated...shittily. It works, but that was to be replaced with tlsf, and I intend to do similar for allocating buffers & textures. Not quite sure what level I want to put it yet for those, for buffers I was thinking of putting it at a higher level (allocate large buffers), and similar with textures (use texture arrays/atlases rather than lots of loose textures).

vk2gpu commented 7 years ago

Just out on interest, I grabbed glslangValidator + spirv-cross to see how translating from HLSL to SPIR-V to GLSL/MSL goes, and after testing on vs/ps/cs, it seems like it will work pretty nicely! Only thing that would need some modifications to in code gen looks to be register mapping for UAVs/SRVs to the appropriate image/texture/SSBO slots. Here's some HLSL I fed into it:

////////////////////////////////////////////////////////////////////////////////////////////////////
// generated shader for shader_tests/00-basic.esf
////////////////////////////////////////////////////////////////////////////////////////////////////

////////////////////////////////////////////////////////////////////////////////////////////////////
// structs
struct View
{
    float4x4 view_;
    float4x4 proj_;
};

struct Object
{
    float4x4 world_;
};

struct VS_IN
{
    float4 position : POSITION;
    float2 uv : TEXCOORD0;
};

struct VS_OUT
{
    float4 position : SV_POSITION;
    float2 uv : TEXCOORD0;
};

////////////////////////////////////////////////////////////////////////////////////////////////////
// cbuffers
cbuffer ObjectCBuffer: register(b0)
{
    Object o_;
};

cbuffer ViewCBuffer: register(b1)
{
    View v_;
};

////////////////////////////////////////////////////////////////////////////////////////////////////
// sampler states
SamplerState SS_DEFAULT : register(s0);

////////////////////////////////////////////////////////////////////////////////////////////////////
// vars
Texture2D tex_diffuse : register(t0);

////////////////////////////////////////////////////////////////////////////////////////////////////
// functions
VS_OUT vs_main(in VS_IN _in)
{
VS_OUT _out = (VS_OUT)0;
_out.position = _in.position;
_out.uv = _in.uv;
return _out;
}

and after HLSL -> SPIR-V -> GLSL for vs_main:

#version 430

struct VS_IN
{
    vec4 position;
    vec2 uv;
};

struct VS_OUT
{
    vec4 position;
    vec2 uv;
};

struct Object
{
    mat4 world_;
};

struct View
{
    mat4 view_;
    mat4 proj_;
};

layout(location = 0) in vec4 _in_position;
layout(location = 1) in vec2 _in_uv;
layout(location = 0) out vec2 _entryPointOutput_uv;

VS_OUT _vs_main(VS_IN _in)
{
    VS_OUT _out = VS_OUT(vec4(0.0), vec2(0.0));
    _out.position = _in.position;
    _out.uv = _in.uv;
    return _out;
}

void main()
{
    VS_IN _in;
    _in.position = _in_position;
    _in.uv = _in_uv;
    VS_IN param = _in;
    VS_OUT flattenTemp = _vs_main(param);
    gl_Position = flattenTemp.position;
    _entryPointOutput_uv = flattenTemp.uv;
}

And doing it with a compute shader using an SRV + UAV:

////////////////////////////////////////////////////////////////////////////////////////////////////
// generated shader for shader_tests/00-particle.esf
////////////////////////////////////////////////////////////////////////////////////////////////////

////////////////////////////////////////////////////////////////////////////////////////////////////
// structs
struct Particle
{
    float3 position;
    float3 velocity;
};

struct VS_IN
{
    float4 position : POSITION;
};

struct VS_OUT
{
    float4 position : SV_POSITION;
    uint idx : SV_INSTANCEID;
};

////////////////////////////////////////////////////////////////////////////////////////////////////
// cbuffers
cbuffer ParticleConfig: register(b0)
{
    float4 time;
    float4 tick;
    int maxWidth;
};

cbuffer Camera: register(b1)
{
    float4x4 view;
    float4x4 viewProj;
};

////////////////////////////////////////////////////////////////////////////////////////////////////
// sampler states

////////////////////////////////////////////////////////////////////////////////////////////////////
// vars
RWStructuredBuffer<Particle> inout_particles : register(u0);
StructuredBuffer<Particle> in_particles : register(t0);

////////////////////////////////////////////////////////////////////////////////////////////////////
// functions
float noise(float x)
{
return sin(x) +
sin(x * 0.61423) * 0.5 +
sin(x * 0.3123) * 0.25 +
sin(x * 0.142) * 0.125;
}

[numthreads(1, 1, 1)]
void cs_main(int3 id : SV_DispatchThreadID)
{
const uint idx = id.x + id.y * maxWidth;
Particle particle = in_particles[idx];
particle.position = particle.position + particle.velocity * tick.x;
particle.velocity += float3(0.0, -9.8, 0.0) * tick.x;
float offset = (float)idx / 1024.0;
particle.velocity.y += noise(time.x + offset) * 0.01;
if(particle.position.y < 0.0)
{
particle.velocity = reflect(particle.velocity, float3(0.0, 1.0, 0.0)) * 0.5;
particle.position.y = 0.0;
}
inout_particles[idx] = particle;
}

and after HLSL -> SPIR-V -> GLSL for cs_main:

#version 430
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;

struct Particle
{
    vec3 position;
    vec3 velocity;
};

layout(binding = 0, std140) uniform ParticleConfig
{
    vec4 time;
    vec4 tick;
    int maxWidth;
} _57;

layout(binding = 0, std430) readonly buffer _71
{
    Particle _data[];
} in_particles;

layout(binding = 0, std430) buffer _137
{
    Particle _data[];
} inout_particles;

float noise(float x)
{
    return ((sin(x) + (sin(x * 0.614229977130889892578125) * 0.5)) + (sin(x * 0.3122999966144561767578125) * 0.25)) + (sin(x * 0.14200000464916229248046875) * 0.125);
}

void _cs_main(ivec3 id)
{
    uint idx = uint(id.x + (id.y * _57.maxWidth));
    Particle particle;
    particle.position = in_particles._data[idx].position;
    particle.velocity = in_particles._data[idx].velocity;
    particle.position += (particle.velocity * _57.tick.x);
    particle.velocity += (vec3(0.0, -9.80000019073486328125, 0.0) * _57.tick.x);
    float param = _57.time.x + (float(idx) / 1024.0);
    particle.velocity.y += (noise(param) * 0.00999999977648258209228515625);
    if (particle.position.y < 0.0)
    {
        particle.velocity = reflect(particle.velocity, vec3(0.0, 1.0, 0.0)) * 0.5;
        particle.position.y = 0.0;
    }
    inout_particles._data[idx].position = particle.position;
    inout_particles._data[idx].velocity = particle.velocity;
}

void main()
{
    ivec3 param = gl_GlobalInvocationID;
    _cs_main(param);
}
boberfly commented 7 years ago

Hey Neil,

Oh nice one! Looks pretty clean and readable too, that would work well for porting to legacy APIs in the future too, like feeding into an older GL without bytecode support. The C++ backend for debug purposes got me thinking, perhaps even an ISPC one could be made here, a bit like Intel's example where they show that they convert HLSL to ISPC by hand: https://github.com/GameTechDev/ISPC-DirectX-Graphics-Samples/

vk2gpu commented 7 years ago

Using PIX you can just debug shaders in D3D12 (and RenderDoc does a great job of D3D11 shader debugging, I think it may do SPIR-V Vulkan too). Regarding HLSL->ISPC, there is also this: https://github.com/zigguratvertigo/hlsl-to-ispc

I'll be keeping my options open, since it would be pretty great to be able to write code once in HLSL, and run it either on GPU or CPU depending on workload (for example, particle systems or cloth sim). Only downside of compiling HLSL to run on CPU is that I'd have to setup something that'll compile and link them to a DLL as if they were data, though that's not the end of the world since my plugin system already supports reloading at runtime.

boberfly commented 7 years ago

Oh didn't know about that HLSL->ISPC project, nice.

When reading the render graph and render pass system, should I use this as a guide to what you're making? https://www.ea.com/frostbite/news/framegraph-extensible-rendering-architecture-in-frostbite

vk2gpu commented 7 years ago

Yep! The initial setup is based on that, which will be fairly clear from even the naming of the helper classes. They all seemed sensible to keep the intended interface functions to use restricted at each stage. Resources right now are recreated every frame. I do intend to implement reuse at some point soon, and the ability to reuse within a frame or alias.

I originally started going entirely C++ lambda for the render passes, but I'll just have that implemented on top of the class instanced based setup I think. Seemed like it'd actually be simpler that way round.

vk2gpu commented 7 years ago

I'm also looking at implementing render pipelines as reloadable plugins right now, since it shouldn't be painful if done right away. The beauty of the "rebuild the render graph each frame" is that there shouldn't be any state to worry about moving from one DLL to another.

vk2gpu commented 7 years ago

Actually got a question for you RE: the binding model.

I'm looking at being able to update resource bindings dynamically, and wondering if you have any thoughts on how to do this, esp. as you're looking at Vulkan right now. I was looking at implementing it as commands - update the SRV, UAV, and CBVs in the pipeline binding states. Problem here in there isn't a way to update them as part of a command list in D3D12, so I was considering doing it by allocating a new range of descriptors as needed. I believe it's cheap to copy descriptors, but wondered if you have some ideas on this front?

Right now the binding model I've went with is prohibitive is I wish to change resources during the frame (i.e. updating constant buffers, changing textures around during post processing, having ring buffers setup for compute shaders to dynamically build geometry, etc). Right now I need to recreate a whole new binding resource, which doesn't seem practical long term.

vk2gpu commented 7 years ago

A thought regarding the binding model - Would it seem reasonable to just expose descriptor heaps/sets & the root signature to the higher level? It would probably be managed by the shader system anyway, so transparent to anyone working above that level. If doing so I've love to hear your thoughts on how Vulkan would map well here.

boberfly commented 7 years ago

Hey Neil,

Sorry just got to reading this now. I'm still learning Vulkan here and this is my learning test-bed so I'm not too opinionated on this just yet or know of the pitfalls, but it's definitely something I should research on. I'd probably just go searching for what others are doing like with Falcor.

I'm not too sure if push constants have an equivalent in DX12? Seems like that's the fastest way to put in custom stuff while creating command buffers in Vulkan, not sure if that helps you here.

vk2gpu commented 7 years ago

Equiv. there would be the root constants - https://msdn.microsoft.com/en-us/library/windows/desktop/dn899219(v=vs.85).aspx. It is something I want to expose, really not fond of how I chose to do binding of resources. Considering going a bit more free form with draw binding + pipeline binding. Right now I'm playing around in apps/testbed to actually 'use' the interfaces I have, and work out what direction I should be taking for certain things.

boberfly commented 7 years ago

This came up: https://developer.nvidia.com/vulkan-shader-resource-binding

I think I should research the problem more here let me know if I'm off here but I remember reading that other abstractions were creating descriptor sets on the fly and hashing them so that they can re-use ones with the same signature so you don't get an explosion of the same ones...

boberfly commented 7 years ago

I think for my stuff I was going to shape it around like what Doom does and not make so many and try to bake it down and use atlasing like mad (I think fortunately they didn't need to make a traditional uber-shader forward renderer that shuffles around loads of custom shader permutations and textures so most everything is self-contained in shaders with offsets).

Actually I think DX12/Vulkan also allow you to go bindless on some hardware (correct me if I'm wrong), perhaps for maximum flexibility this should live in the shader part of things like you're suggesting, so the client can choose how things should bind?

Curious to see how this project will approach the binding once it appears: https://github.com/tlorach/vkFX

vk2gpu commented 7 years ago

Yeah I've thought about putting more of the responsibility for binding into the shader system. Rather than have it rigidly defined, the shader system can generate either bytecode to be interpreted by the backend (to make appropriate API calls such as chains of vkCmdPushConstants/SetComputeRoot32BitConstants/etc) for runtime iteration, or also generate C/C++ code for final builds.

boberfly commented 7 years ago

http://ourmachinery.com/post/high-level-rendering-using-render-graphs/

Looks like a trend ;)

vk2gpu commented 7 years ago

I've actually got a forward+ renderer working in my test bed at the moment as I get the render graph API all setup :) Not very impressive to look at yet, but functional at least:

image

Once I've tidied up some stuff I'll be committing it. Mostly stuck with test bed stuff for now before I continue shelling out the API.

boberfly commented 7 years ago

New SDL is coming out, that'll save me doing Vulkan load-library funny business: https://discourse.libsdl.org/t/sdl-2-0-6-prerelease/23024

vk2gpu commented 7 years ago

If you intend to use it, feel free to upgrade the version in 3rdparty :) I only lean on SDL2 for window creation and input handling really so should be a smooth upgrade.

vk2gpu commented 7 years ago

It's a little rough, but the latest render graph stuff has been committed. Still trying to refine it a little, but if you've got any feedback let me know! I still want to try thin it down a bit, there's still more boiler plate than I'd like when setting it all up.

boberfly commented 6 years ago

Sweet! Yep I'll give it a test on my end for sure when I get home, cheers. (I'd go for clustered over tiled, but tiled works good too). ;)

vk2gpu commented 6 years ago

Yeah I intend to do that, this was quick and easy enough to implement and get working as a test for now :) I'll probably have a play with putting a couple of different pipelines together, since it's easy enough with the render graph.

vk2gpu commented 6 years ago

Just an FYI, I'm having a poke at improving the bind model at the moment. Decided that I should try to move away from the large binding objects as a default option, it's turned out a lot more inconvenient than I wanted. Still haven't decided on what I intend to do to replace it, but I'm probably going to go along the lines of having binding sets just map to descriptor tables/sets directly, and when setting up the bindings for a draw, you specify a number of these & the ranges that you wish to bind, with the ability to update them without going via commands. Part of this work will also be to allow root/push constants too, but still fleshing out how I want to define root signatures/pipeline layouts.

Whether you're still wanting to poke around with Vulkan stuff in this code base or not (if you don't I'm not offended), I'd be interested to know your opinion on bind model design. I'm hoping to make it a bit more flexible over all, but keep it very data driven and mostly handled by the Graphics::Shader interface so its transparent to the user.

boberfly commented 6 years ago

Hey Neil,

Sorry I just saw this now and have been playing around with other things (like BGFX integration into Urho3D/starting a new job/etc.), but I'd like to revisit this Vulkan support in the future for my own learning at least (I thought I would wait a little bit until you flesh out some of these things).

I don't have much of an opinion right now as I've kind of forgotten a lot of the details on the binding mechanisms so my opinion wouldn't be too well thought-out. What you say sounds good to me, if it doesn't have limitations (root/push constants support sounds great, at the expense of a more complicated setup in the higher-level shader render code? That's fine to me I think!).

Another thought is supporting older APIs if it's possible, imo just using sokol_gfx.h sounds like something simple to plug-in.

vk2gpu commented 6 years ago

No worries man! The latest resource binding stuff is there, but I intend to refine it further. Right now I have allocation of temporary descriptor sets setup at the higher level, just to give me a bit more flexibility until I get some more interesting stuff done first (wanting to prototype something small first, need a break from bindings...)

To handle root constants and what not, I think the idea I'm setting on is to generate bytecode at the shader compilation step, and have that map roughly onto the API calls. It seems like that'd give me the most flexibility, and ability to even just generate code to handle the bindings without needing to push too much data through command lists to remap things.

The new higher level binding stuff I've setup similar to the Destiny shader architecture, what it calls "Scopes" I call "BindingSets". The intent for things like root constants/descriptors would be that you just mark up in the shaders themselves to give it a hint, and it can select an appropriate root signature that fits best based on the how you've tagged the binding sets.

As part of my own personal road map, I opted to not support older APIs. This isn't set in stone, I would quite like to get WebGL going again at some point, but it's low on my priority list to look at for the mean time. It'd certainly be a bit nicer to hook up older APIs with the newer binding set stuff, since there is no need to abstract how all that works on the interface, the shader compilation step can effectively setup what it needs.