memononen / nanovg

Antialiased 2D vector drawing library on top of OpenGL for UI and visualizations.
zlib License
5.06k stars 767 forks source link

Vulkan support #614

Open danilw opened 3 years ago

danilw commented 3 years ago

If someone looks for a working Vulkan port of this nanovg lib src https://github.com/danilw/nanovg_vulkan

leranger commented 3 years ago

Hi @danilw I am testing your vulkan port on windows 10 with an amd radeon rx560x cards and I have some artifacts: artifacts

do you know what could be causing this ? thanks

danilw commented 3 years ago

you not first who report this problem in the AMD GPU, and sadly I have no idea what causes it and I can not fix it because I can not see this bug myself

I tested this code on AMG Vega8 and it works perfectly fine there as you can see on the screenshot nvgvk

as I write on code page:

Depth order bug - I test myself on Nvidia and AMD and have not seen any depth-bugs. But on some AMD videocards depth bugged. I can not test my self so I can not fix it. I dont see any "logical error" myself. Screenshot from Vulkan version of this build on AMD GPU, elements draw in the same order as in OpenGL.

sorry if you want to fix it then do it yourself, I can not fix it (it may be driver bug, but I don't know, maybe its my code bug)

leranger commented 3 years ago

ok thanks for answering

leranger commented 3 years ago

Hi @danilw I think I found a fix to the artifacts on AMD cards. In createRenderPass function, I changed stencilLoadOp of the depth/stencil attachment to VK_ATTACHMENT_LOAD_OP_CLEAR. I have tested it in a Linux virtual machine whith lavapipe driver ant it seems to work. I do not know if this the right fix as I am a Vulkan newbie.

danilw commented 3 years ago

@leranger thank you I added this as a fix

https://github.com/danilw/nanovg_vulkan/commit/6ee100956134cab2aab67a6a8a7a5bda54c0f9ab

mulle-nat commented 2 years ago

@danilw I built your nanovg demo project with -DCMAKE_BUILD_TYPE=Release which produces CFLAGS=-O3 and compared it with the nanovg demo also built with -O3.

I noticed that the Vulkan demo runs with 1200 FPS where as the OpenGL demo runs with 2000 FPS. Now that's a bigger difference than I expected, and in the wrong direction. But may this is to be expected ? I have no experience with Vulkan.

Something else I noticed is, that resizing of the OpenGL window is smooth with hardly any FPS drop. But the Vulkan window redraw gets really choppy.

danilw commented 2 years ago

@mulle-nat

I noticed that the Vulkan demo runs with 1200 FPS where as the OpenGL demo runs with 2000 FPS. Now that's a bigger difference than I expected, and in the wrong direction. But may this is to be expected ? I have no experience with Vulkan.

Yes this is "problem" of this Vulkan port, and I have no idea where to look and how to fix it (I sure it can be fixed), yes there is something "wrong" this is for sure.

My result on screenshot: ngt

in OpenGL it shows 62% GPU Utilization and 800fps, when in Vulkan 360fps and 42% GPU Utilization. My screen has 60 refresh rate.

Maybe this is "fine" for real case usage because for real case FPS capped to refresh rate, and GPU usage almost same on my 60FPS refresh rate for Vk and Ogl this demo apps.

I have no "solution" for this, I can say - you can make own "port" where it work better than in this Vulkan port, I do not need to rewrite it (I have not any real usage of this Vulkan port) so I keep it as it is for now.

But the Vulkan window redraw gets really choppy.

Resize in Vulkan is "hard to make it look good" basically to have correct resize without "any flashes and smooth performance" you have to make own swapchain implementation (to avoid annoying flashes on resize in native Vulkan swapchain implementation), and render everything in framebuffer with multithreading for everything and recreate framebuffers on resize also in multithreading using old framebuffer while slowly (using multiple frames to compensate lag and slow memory allocation on GPU) recreating new for resize. FPS drops during resize because on resize(in this demo app) recreated everything with deletion and creation everything on single frame.

This is way too large task for this "demo", and "correct resize" should be implemented in "your application", when this nanovg-vulkan implementation should be used just as "library to render UI" sharing vulkan device and just render on top of your "framebuffer" something like this. Resize implementation in this demo-app is just basics of Vulkan swapchain usage without anything else, enough to show "its works".

mulle-nat commented 2 years ago

I sampled both demos using "hotspot" and it didn't make much sense to me in terms of what might be the slowdown. One idea I had is, that maybe Vulkan is going through Wayland and that slows it down, whereas OpenGL seems to run on X11. I don't know how to verify this though.

Edit: glfw's default currently is X11. I tried a build with Wayland, but it makes no difference.

The resize flashes I can also observe in OpenGL programs, where the redraw exceeds the frame time. It's something I am trying to solve in a way similiar or identical to what you wrote.

Your port is very interesting to me, because I don't know about the longevity of OpenGL. Having Vulkan as a backend is very nice for me.

danilw commented 2 years ago

One idea I had is, that maybe Vulkan is going through Wayland and that slows it down, whereas OpenGL seems to run on X11. I don't know how to verify this though.

in this demo app Vulkan surface depends on GLFW3 setting that you set on building this app. This demo app does not control surface creation. (as I say this app use basic Vulkan surface because GLFW3 make it) it can be x11 or Wayland depends of glfw3 building settings (I dont know if GLFW3 make auto-detection and switch to Wayland in build script or they use x11 by default as most apps)

The resize flashes I can also observe in OpenGL programs

this depends on GPU and GPU driver when everything default used. I think you use Nvidia because OpenGL and Vulkan flashes only on Nvidia in Linux, in Windows it does not. AMD Mesa driver does not flash swapchain on resize. (I did not test this demo-app behavior, but others apps I saw work like this)

danilw commented 2 years ago

I sampled both demos using "hotspot" and it didn't make much sense

I think (can be wrong) it happens because of one of these cases:

  1. Even some of Vulkan Khronos samples works slower(have fewer fps) than its OpenGL analogue (that achieve same result showing asset or processing compute shader) and Vulkan Khronos samples performance also different between GPU vendors (in some samples up to 10x more FPS in AMD and vice versa)
  2. FPS that 2x higher than screen refresh rate should not be used as "benchmark numbers", because it has nothing to do with performance processing "fake frames".
  3. this Vulkan nanovg port may have some problem in memory flags or updating resources that cause this slowdown (if this is slowdown)

@mulle-nat for better "real time communication" if you have more questions better join discord (link in description on my github page with nanovg port)

mulle-nat commented 2 years ago

I have no further questions :) Thanks for all the tips.

nidefawl commented 2 years ago

The stencil strokes are bugged on the vulkan fork. It renders the path multliple times and does not apply correct stencil tests.

Here is a fix https://github.com/nidefawl/nanovg_vulkan/commit/67f3a2595a97c3e122ef5de14bb022382dc8d076

danilw commented 2 years ago

@nidefawl can you make it as pull request to my fork so il add it, if you want ofc.

danilw commented 2 years ago

@mulle-nat about "low Vulkan FPS" - I added 3 more examples, and one of examples use multiple frames in flight. Look at this example nanovg-vulkan-glfw-integration-demo

In this example FPS with 1 frame in flight ~200, with 2 frames in flight ~600 (look screenshot in description of example).\ In OpenGL example I have ~1000 fps, so with 600 fps in Vulkan there still some space to optimizations look like, but as I saw the logic and code - only big drawing logic change may help, and it looks way too much for me. \ Maybe I missing some obvious optimization, feel free to Pull request if so.\ Pull requests to main https://github.com/danilw/nanovg_vulkan library port repo.

(I am not going to do any logic change for now, maybe latter if there will be some requests/interest to this library/port, for now I just added more examples)

Screenshot of OpenGL and Vulkan at same time: (this blue images on top in Vulkan example - come from integration example) 1

Two OpenGL at same time (~500 fps) - image 2 OpenGL

Two Vulkan at same time (~300 fps) - image 2 Vulkan

mulle-nat commented 2 years ago

@danilw Nice work.

SubiyaCryolite commented 1 year ago

@mulle-nat about "low Vulkan FPS" - I added 3 more examples, and one of examples use multiple frames in flight. Look at this example nanovg-vulkan-glfw-integration-demo

In this example FPS with 1 frame in flight ~200, with 2 frames in flight ~600 (look screenshot in description of example). In OpenGL example I have ~1000 fps, so with 600 fps in Vulkan there still some space to optimizations look like, but as I saw the logic and code - only big drawing logic change may help, and it looks way too much for me. Maybe I missing some obvious optimization, feel free to Pull request if so. Pull requests to main https://github.com/danilw/nanovg_vulkan library port repo.

(I am not going to do any logic change for now, maybe latter if there will be some requests/interest to this library/port, for now I just added more examples)

Screenshot of OpenGL and Vulkan at same time: (this blue images on top in Vulkan example - come from integration example) 1

Two OpenGL at same time (~500 fps) - image 2 OpenGL

Two Vulkan at same time (~300 fps) - image 2 Vulkan

May be a dud, but I suspect the memory of all three buffers ( fillVertShader, fillFragShader, fillFragShaderAA ) being on HOST could be a reason for the relatively poor performance. Looking at both CPU and GPU utilization, there is a bottleneck somewhere, and it may be Memory/PCIE bandwidth. For reference: https://www.youtube.com/watch?v=bUUZ1iD9_e4

danilw commented 1 year ago

there is a bottleneck somewhere, and it may be Memory/PCIE bandwidth

@SubiyaCryolite I thought I mention it somewhere here, but look like I have not: This nanovg port rebuild rendering pipeline for every single frame, so it may be related to CPU-GPU uploading. And as most obvious optimization - "cache" UI pipeline and do not rebuild when UI not updated.

Better optimization - somehow "update only part of pipeline that related to changed state element" - but it seems way too complex for me to do, or I have no idea how to do it simple. I have no motivation for something overcomplicated like this task, I can do it as full-time job only and as this project originally was just "test for vulkan-related job" so I not going to do any complex optimizations there.

I suspect the memory of all three buffers ( fillVertShader, fillFragShader, fillFragShaderAA ) being on HOST could be a reason for the relatively poor performance

il check it now, thanks

For reference: https://www.youtube.com/watch?v=bUUZ1iD9_e4

great video xD Vulkan complexity explosion xD

SubiyaCryolite commented 1 year ago

there is a bottleneck somewhere, and it may be Memory/PCIE bandwidth

@SubiyaCryolite I thought I mention it somewhere here, but look like I have not: This nanovg port rebuild rendering pipeline for every single frame, so it may be related to CPU-GPU uploading. And as most obvious optimization - "cache" UI pipeline and do not rebuild when UI not updated.

Better optimization - somehow "update only part of pipeline that related to changed state element" - but it seems way too complex for me to do, or I have no idea how to do it simple. I have no motivation for something overcomplicated like this task, I can do it as full-time job only and as this project originally was just "test for vulkan-related job" so I not going to do any complex optimizations there.

I suspect the memory of all three buffers ( fillVertShader, fillFragShader, fillFragShaderAA ) being on HOST could be a reason for the relatively poor performance

il check it now, thanks

For reference: https://www.youtube.com/watch?v=bUUZ1iD9_e4

great video xD Vulkan complexity explosion xD

Yes, Vulkan is incredibly complex XD. Truth be told your work is a great foundation. I've made my own fork and I'm currently focused on getting performance to exceed GL. It might not be possible XD, but so far I'm enjoying the challenge.

I may or may not initiate a PR in a few days time but so far I've don't the following.

1: Change mapping to coherent Flags changed from VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT to VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT. Also added a property void* mapped; to VKNVGBuffer, allowing buffers to be mapped only once on creation/resize.

2: Added support for more than One or more Command Buffers For cases where each swap image has it's own command buffer, I made the following properties to VKNVGCreateInfo: VkCommandBuffer cmdBuffer -> VkCommandBuffer *cmdBuffer and uint32_t *currentBuffer. Where currentBuffer points to a variable holding a value of 0 if only one buffer is used, or to the variable holding the value of pImageIndex from the last call to vkAcquireNextImageKHR. Within nanovg_vk.h references to command buffer take the following pattern: uint32_t currentBuffer = *vk->createInfo.currentBuffer; -> VkCommandBuffer cmdBuffer = vk->createInfo.cmdBuffer[currentBuffer];

3: Migrating from UNIFORM_BUFFER to STORAGE_BUFFER Primarily with the aim of migrating to about 8 Multi-Draw-Indirect as opposed to 230* individual draw calls currently. The aim is to have all uniforms in one large SSBO, accessed by a new, per instance, Vertex Attribute callId. My aim is to cycle through all pipelines (about 8 in the demo app) and have one MDI call for each. If MDI cant get FPS above 700 then I'm not sure anything can.

This may all be pointless in the end, but it's worth a shot XD. Thank's again for all the work you've done thus far.

danilw commented 1 year ago

@SubiyaCryolite

Looking at both CPU and GPU utilization, there is a bottleneck somewhere, and it may be Memory/PCIE bandwidth

I did small investigation of PCIE usage by this nanovg vulkan-port:

I suspect the memory of all three buffers ( fillVertShader, fillFragShader, fillFragShaderAA ) being on HOST could be a reason for the relatively poor performance

I see nanovg_vk.h#L1517 VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT for uniforms... and it seems correct.

And base on this logic - PCIE usage must be "bottleneck" so it always must be very high especially when FPS drop low.

Look more detailed - PCIE usage in nanovg_vulkan

Below on this page is short version.


Vulkan test:

I use Vulkan example with single frame in flight, this why FPS is low in Vulkan.

For example - you can press hotkey Space: left 400 fps, right 90fps with hotkey image

So this mean - right example must show some crazy high PCIE usage: PCIE usage for examples above, order same as on screenshots above image

Result: scr FPS PCI-E usage RX/TX
Left screenshot without Space hotkey 400 RX: 600-700MiB and TX: 100-150 MiB
Right screenshot with Space hotkey 90 RX: 250-350MiB and TX: 50-100 MiB

Comparing to OpenGL PCIE usage:

image

scr FPS PCI-E usage RX/TX
Left screenshot without Space hotkey 1100 RX: 500-600MiB and TX: 50-80 MiB
Right screenshot with Space hotkey 180 RX: 800-900MiB and TX: 80-100 MiB

There still possibility that my tools to measure performance are broken or my method is incorrect. But if this statistic is correct, and I not misinterpret results - PCIE is not a bottleneck.

I can not use those modern-Nvidia tools to debug - because my Nvidia GPU is way too old, so it is not supported by those tools, but this Nvidia GPU still support Vulkan.

danilw commented 1 year ago

I've made my own fork and I'm currently focused on getting performance to exceed GL. It might not be possible XD, but so far I'm enjoying the challenge.

Better you do PR request when its ready, so I can/will give you proper "thanks" and you will be listed in contributors in Github source code pages.

I may or may not initiate a PR in a few days time but so far I've don't the following.

Do not rush, any time when/if it will be ready.

This may all be pointless in the end, but it's worth a shot XD. Thank's again for all the work you've done thus far.

Your changes look "very needed" to improve this project, will be nice if it also improve performance. But I am not going to change anything for now, if/when your improvement will be ready il look on it.

danilw commented 1 year ago

3: Migrating from UNIFORM_BUFFER to STORAGE_BUFFER

when I was making "multiple frames in flight" example - it becomes obvious that "rebuilding every single triangle on CPU every frame" - take alot of time, this why FPS drop so much compare to OpenGL. (I mean OpenGL nanovg does same rebuild triangles every frame, but OpenGL has its own "mltithreading" feature that basically 3-frames in flight when possible, and somehow it is this fast)

With "multiple frames in flight" - CPU build triangles and render-pass in parallel while GPU render, so fps almost double from original.

This is only optimization I did in this project, everything else done by other people.

P.S. I see nanovg_vulkan bottleneck in CPU-side, maybe its some "bad-cache-pattern" maybe it related to building vulkan pipeline, maybe some PCIE timing synchronization pattern broke because Vulkan does upload data to GPU when "you allocate it" not like OpenGL can do some "magic" optimizations so maybe too many "small allocations and PCIE timer is slow", or some more complex stuff. Ofc I can be completely wrong, better do your tests

danilw commented 1 month ago

I archived my https://github.com/danilw/nanovg_vulkan If someone want - fork, it still works with just one major bug is DPI-scale does not work.

Im not goting to close this Issue-thread, since this my nanovg_vulkan is still "only" working nanovg Vulkan port on github. so someone may need it.

mulle-nat commented 1 month ago

I don't need it at the moment, but I appreciate that it exists. :+1: