vulkano-rs / vulkano

Safe and rich Rust wrapper around the Vulkan API
Apache License 2.0
4.51k stars 434 forks source link

Incorrect depth and weird artifacts #1335

Closed hpatjens closed 3 years ago

hpatjens commented 4 years ago

Issue

I created an example that renders a model (see following figure) but somehow the depth ordering is not correct although depth buffering is active.

normal

With deactivated depth test (leaving out .depth_stencil_simple_depth()), the model is rendered as follows.

no-depth

I don't understand why this is happening. Please note that I don't have any previous Vulkan knowledge but quite a bit of experience with OpenGL.

Please checkout the example repository and feel free to create PR to fix the situation.

Observations

  1. When I change the depth buffer format to D32Sfloat for example the artifacts are reduced but still present.
  2. The same artifacts appear in the teapot example when the scale matrix on this line is removed, the camera is moved back to 100 and the far plane is extended to 1000.

Template

AustinJ235 commented 4 years ago

I don't know if vulkano has any way of checking which formats are supported, but I would check with a vulkan compatibility tool make sure that the depth format is supported.

Check out https://vulkan.gpuinfo.org/download.php

hpatjens commented 4 years ago

I'm not sure how to read the output.

Format Linear Optimal Buffer
D16_UNORM true true false
D16_UNORM_S8_UINT false false false
D24_UNORM_S8_UINT true true false
X8_D24_UNORM_PACK32 true true false
D32_SFLOAT true true false
D32_SFLOAT_S8_UINT true true false

When I understand this correctly, linear, optimal, and buffer are bit masks. So, does false mean that no feature is supported?

AustinJ235 commented 4 years ago

Hmm maybe this isn't the best program. I am not sure on the meaning of the column values, but D16_UNORM is the only mandatory depth format. I think optimal means that it is supported

I think you are right to think that there is something up with the precision so up'ing the precision would help. the vk spec hints me to believe that this buffer stores two values depth and stencil. So although you may be up'ing the depth's precision you might not be changing the stencil's precision. I think the stencil is what ultimately decides what fragments make the culled out, ie the artifacts you are see'ing are caused by this.

With that said the format naming scheme seems to reaffirm this idea as for example D32_SFLOAT_S8_UINT would mean a 32 bit depth and a 8 bit stencil. I don't see any formats with my gpu that would hit toward anything higher than an 8 bit stencil however.

I would recommend try to remove culling in general. So remove the two lines here

.front_face_counter_clockwise()
.cull_mode_back()

The default is to not cull if no preference is specified, but you can do cull_mode_disabled() to make sure. There is no performance cost to disabling this. The gpu has to do the work anyways. This is for say if you were to go inside an object you wouldn't see the outside of the object as the triangles would be rotated the incorrect way.

I may try out your example repo later when I have time to do so, but I would try that recommendation first.

hpatjens commented 4 years ago

I think you are right to think that there is something up with the precision so up'ing the precision would help. That said, using a 32 bit depth buffer doesn't solve the problem.

I think that shouldn't be necessary with a near plane of 0.01 and a far plane of 1000 and a model with a diameter of about 100.

the vk spec hints me to believe that this buffer stores two values depth and stencil. So although you may be up'ing the depth's precision you might not be changing the stencil's precision. I think the stencil is what ultimately decides what fragments make the culled out, ie the artifacts you are see'ing are caused by this.

I would be really surprised if the stencil buffer had a role in the depth test at all. When I choose D16Unorm as my depth format, I expect there not to be a stencil buffer at all. (Or no bits being reserved for the stencil buffer.) But maybe I'm getting you wrong on this one.

I would recommend try to remove culling in general.

Using culling was a conscious descision but I tested it again without culling and the result is almost the same.

no-culling

There is no performance cost to disabling this. The gpu has to do the work anyways.

Disabling itself will not have an effect I suspect but since back facing triangles will be rendered which will be occluded by front facing triangles, I expect a quite large performance impact. But maybe this is not what you are saying.


I added the VK_LAYER_LUNARG_standard_validation extension and registered the callback to see if there is anything going wrong entirely.

hpatjens commented 4 years ago

Would be great if you could test the example of your machine. I don't have a second machine to test it one to make sure that it's not the hardware configuration.

Maybe you could take a look at the matrices to make sure that there is nothing wrong with the z transformation or depth range. I'm not sure if I'm understanding all consequences of the Vulkan coordinate system compared to OpenGL.

AustinJ235 commented 4 years ago

testing rn, currently up'ing the precision removes the issue, nothing is occluded, anymore. There appears to be some z-index fighting yet.

AustinJ235 commented 4 years ago

So with culling disabled on the 32 bit precision I get some z-index fighting. Turns out there is a purpose to that when you have back to back triangles like that. You could maybe remove those and turn culling off to maybe gain some fps. example video

With culling back on and 32 bit precision the model renders flawless from what I can tell, example video

With just the 16 bit precision on the depth the artifacts are as you show, but I think even worse on my machine. So I don't know if this would be a driver issue with NVIDIA or not.

I use an AMD Vega 56 on the latest Windows 10 Pro insider fast.

AustinJ235 commented 4 years ago

So your D32_SFLOAT in vulkan caps shows it being linear where as mine isn't. Wonder if that is the cause of this issue and if you'd have better luck with a 32 bit format that isn't linear.

AustinJ235 commented 4 years ago

Ok did some further researching, if you are using a linear buffer you may have to change your projection matrix to be linear depth instead. nalgebra seems to produce one that is for non-linear depth. You can combat this in your vertex shader by modifying the result.

gl_Position = push_constants.matrix * vec4(position, 1.0);
gl_Position.z = 1.0 / gl_Position.z;

By doing this to mine it replicates the behavior that you are having.

hpatjens commented 4 years ago

So with culling disabled on the 32 bit precision I get some z-index fighting. Turns out there is a purpose to that when you have back to back triangles like that. You could maybe remove those and turn culling off to maybe gain some fps.

Same here: d32sfloat-z-fighting

With culling back on and 32 bit precision the model renders flawless from what I can tell, example video

The effect is greatly reduced but still there. You don't see that at first glance but it's visible on the seat for example. You can see that in your video as well. d32sfloat-problem

gl_Position.z = 1.0 / gl_Position.z;

Adding this line in the shader after doing the projection distorts the model. I'm not sure why that is the case. Only changing the z coordinate shouldn't change x and y. d32sfloat-warp

So your D32_SFLOAT in vulkan caps shows it being linear where as mine isn't. Wonder if that is the cause of this issue and if you'd have better luck with a 32 bit format that isn't linear.

I will test some of the formats that have additional stencil bits but my gut feeling is that there is something different going wrong.

AustinJ235 commented 4 years ago

I think your problem is the projection matrix. It is sorta odd that it would work for me. https://docs-src.amethyst.rs/stable/amethyst_rendy/camera/struct.Perspective.html

a vulkan game engine states in their docs: Because we use vulkan coordinates internally and within the rendering engine, normal nalgebra projection objects (Perspective3) are incorrect for our use case.

Not sure if this would apply here. Personal in my little game project I had issues with the perspective matrix early on. I think I ended up doing away with available perspective matrixes.

I can look later to see how about I do this as I believe it works on nvidia and amd. At least with the gtx 780.

On Mar 29, 2020, 5:36 AM, at 5:36 AM, Henrik Patjens notifications@github.com wrote:

So with culling disabled on the 32 bit precision I get some z-index fighting. Turns out there is a purpose to that when you have back to back triangles like that. You could maybe remove those and turn culling off to maybe gain some fps.

Same here: d32sfloat-z-fighting

With culling back on and 32 bit precision the model renders flawless from what I can tell, example video

The effect is greatly reduced but still there. You don't see that at first glance but it's visible on the seat for example. You can see that in your video as well. d32sfloat-problem

gl_Position.z = 1.0 / gl_Position.z;

Adding this line in the shader after doing the projection distorts the model. I'm not sure why that is the case. Only changing the z coordinate shouldn't change x and y. d32sfloat-warp

So your D32_SFLOAT in vulkan caps shows it being linear where as mine isn't. Wonder if that is the cause of this issue and if you'd have better luck with a 32 bit format that isn't linear.

I will test some of the formats that have additional stencil bits but my gut feeling is that there is something different going wrong.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/vulkano-rs/vulkano/issues/1335#issuecomment-605616077

AustinJ235 commented 4 years ago

Give this a go, https://github.com/hpatjens/issue-example/pull/1

perspective_rh_zo Creates a matrix for a right hand perspective-view frustum with a depth range of 0 to 1

Perspective3 seems to be geared toward opengl

I think that weird warping the z going negative, so it is going behind the camera in the math, but negative z doesn't exists.

hpatjens commented 4 years ago

I think perspective_rh_zo is correct but it still doesn't fix the issue. If the matrix is the problem the teapot sample from the vulkano examples has the same problem.

Next, it will remove the gl_coordinate_system matrix and rewrite everything so that it works with y down. Then, there is less room for mistakes.

AustinJ235 commented 4 years ago

Were you able to solve your issues or you having problems yet?

hpatjens commented 4 years ago

No, nothing changed.

  1. Do you know any examples (except the ones from the vulkano repository) or small projects I could read and test?

  2. Why is there a scale matrix in the teapot example on this line? I just realised that you commited the that section of the code. As I wrote in the issue, when I remove that and move the camera back, I get the same problem.

At this point,

Since this is the first time using vulkano for me, I'm kind of loosing trust that everything is going as it should in the wrapper.

hpatjens commented 3 years ago

I could reproduce this with ash came to the conclusion that it doesn't have anything to do with vulkano. Thanks for the time.

Adam-Gleave commented 3 years ago

Sorry, I know this has been closed, but what was the problem here? I'm seeing exactly the same thing and can't figure it out for the life of me.

AustinJ235 commented 3 years ago

I'd suggest trying a scale matrix, playing with zfar and znear, and/or change the depth buffer precision. otherwise if your view is warped make sure your matrix math is correct and the ordering is correct

hpatjens commented 3 years ago
  1. Move the near plane as far away from the camera as you can afford. This is the default advice.
  2. Try moving the far plane outwards. This might be counter-intuitive as it stretches out the depth values in a smaller range but for me this gave better results which I attribute to the depth values being moved closer to the camera and therefore into the region of the depth buffer that has a higher precision. I didn't try to validate that assumption but it worked for me.