hanatos / vkdt

raw photography workflow that sucks less
https://jo.dreggn.org/vkdt
BSD 2-Clause "Simplified" License
378 stars 35 forks source link

artifacts in every image (AMD GPU) #67

Closed qosch closed 1 year ago

qosch commented 1 year ago

I have a quite severe issue here: every image I open (from my camera (.arw) as well as a downloaded (.nef)) show these artifacts: img_0000 They appear on exported images as well as thumbnails in the lighttable as well as the darkroom. I'll bisect later, any idea where to start? This was done using the latest master, but it happened earlier as well.

Operating System: Manjaro Linux KDE Plasma Version: 5.26.4 KDE Frameworks Version: 5.101.0 Qt Version: 5.15.7 Kernel Version: 6.1.1-1-MANJARO (64-bit) Graphics Platform: Wayland Processors: 12 × AMD Ryzen 5 3600 6-Core Processor Memory: 15.6 GiB of RAM Graphics Processor: AMD Radeon RX 6600 8GB MESA 22.3.1 Dual 4k screens

charlesmonson commented 1 year ago

I think this duplicates #66.

However, it is happening for me as well with an AMD 6700XT on both X11 and Wayland. I'd be happy to help provide info, but I don't really know where to start.

Kernel: 6.1.7-arch1-1 Mesa: 22.3.3

hanatos commented 1 year ago

if it is the same as #66, which driver was that? i don't know much about AMD, but there are several options for drivers right? something amdgpu-pro and the other what? it seems to me one of the two works.

hanatos commented 1 year ago

However, it is happening for me as well with an AMD 6700XT on both X11 and Wayland. I'd be happy to help provide info, but I don't really know where to start.

wait you mean it happens even with the proprietary amdgpu-pro driver?

this looks like the image barriers after compute shaders don't work, or the memory management is confused about reusing the same VkBuffers for several VkImages.. but all of that is working with nvidia and intel drivers, and i thought amdgpu-pro as well.

does the commandline interface work as expected? (i.e. is this a compute vs gui draw pass sync issue)

if you can compile vkdt, it might be worth a try to place a return VK_SUCCESS as first line in src/pipe/graph.c:1284 free_inputs (this will use unreasonable amounts of memory) and see if that changes anything at all?

charlesmonson commented 1 year ago

I've only tried with the in kernel amdgpu driver, not the proprietary amdgpu-pro one.

I compiled from latest master after making the return VK_SUCCESS change and it still happens both when viewing in the GUI and after exporting. It also happens using vkdt-cli for me.

hanatos commented 1 year ago

thanks for confirming. so it is the barrier, not something in the command buffers between gui drawing and executing the graph. also my memory allocator doesn't seem to be the issue then.

it really looks like something as basic as the image barriers aren't synchronising the individual compute shaders correctly.. that might very well point to a driver issue.

kanyck commented 1 year ago

I have the same issue in histogram, thumbnails, main window and exported image. OS: Gentoo linux Vulkan: mesa 22.3.3 RADV amdgpu @ kernel 6.1.0-pf1 AMD Radeon RX 470 8Gb Xorg 21.1.6

hanatos commented 1 year ago

could you try the amdgpu-pro proprietary driver please? even if it hurts?

and yes, the fact that it shows up everywhere even in such simple things as assembling a histogram means that it's really a basic operation between kernels, not a particular module.

the half-built-up tiles look exactly like it's ignoring the image layout barriers in between the kernels.

kanyck commented 1 year ago

So, RADV -> vkdt produces artifacts as shown before AMDVLK-22.4.4 -> nothing works at all, including vulkaninfo. However, newer verstion is out 2 days ago, maybe it will work... not in portage yet. amdgpu-pro-vulkan -> see the screen shot. However, this program uses vulkan for image processing, too and works for me with both RADV and amdgpu-pro (the latter shortly flashes with pink, than works just fine). So you maybe bumped some corner case with vulkan... screenshot2

qosch commented 1 year ago

One quick remark: I can't remember seeing this behavior a few weeks ago. I still think git bisect could lead to the problem here.

kanyck commented 1 year ago

@qosch You're probably right. When I firstly run vkdt last week it worked without those artifacts IIRC.

hanatos commented 1 year ago

puahaha these images look even better :dagger:

is that a 30bit/pixel screen? somehow the image looks like wrong bit depth/endianess.

i'll look at this and will see what i can do. your idea that there's some corner case that i'm triggering here is good. sometimes the validation layers don't find all things wrong with the code either.

kanyck commented 1 year ago

these images look even better

TBH, I seriously doubt it 😠

is that a 30bit/pixel screen?

Sure.

i'll look at this and will see what i can do

Looking forward to it. I consider vkdt to be very promising software, even on this early stage. Well, maybe except the GUI part 😁 Also, probably the upscayl code is also worth looking into as a reference...

hanatos commented 1 year ago

as for the 30-bit rendering thing: you should have something in your vkdt -d qvk output that states

[qvk] available surface formats:
[qvk] B8G8R8A8_UNORM
[qvk] B8G8R8A8_SRGB

and also lists a version with some 10s in between. the images look like that's not the case and it renders in 8-bit anyways (to be reinterpreted as 10-bit per channel later on by xorg). i didn't know amd could do that anyways.

qosch commented 1 year ago

Okay I just went through git bisect but here's the issue: going back to commits from December, I get nothing but crashes. Sometimes on opening vkdt, sometimes on opening the darkroom. Anyway, if I did nothing wrong (I went through sudo make clean, sudo make install /opt/vkdt/bin/vkdt, then tested, then git bisect XXX), the first bad commit seems to be 72f14df48a40a0f4e9ef975b33baf567f2c06861 . "Bad" being the artifacts described in this issue, "good" being mostly crashes. After having typed this, I tried going to the first "bad" commit (so the first showing the behaviour described here) but got a crash as well. So I guess this means no new insights in this comment :D I just tried the next commit 72f14df48a40a0f4e9ef975b33baf567f2c06861 , it does show the behaviour for me. No idea if it is the first commit, though. I also noticed that I often got ../mesa-22.3.3/src/vulkan/wsi/wsi_common_x11.c:1629: Swapchain status changed to VK_ERROR_SURFACE_LOST_KHR in the console.

kanyck commented 1 year ago

amdgpu-pro:

[qvk] available surface formats:
[qvk] B8G8R8A8_UNORM
[qvk] B8G8R8A8_SRGB
[qvk] colour space: 0
[qvk] available surface formats:
[qvk] B8G8R8A8_UNORM
[qvk] B8G8R8A8_SRGB
[qvk] colour space: 0
[qvk] error VK_TIMEOUT executing vkAcquireNextImageKHR(qvk.device, qvk.swap_chain, 2ul<<30, image_acquired_semaphore, VK_NULL_HANDLE, &vkdt.frame_index)!
[qvk] available surface formats:
[qvk] B8G8R8A8_UNORM
[qvk] B8G8R8A8_SRGB
... errors repeated

RADV:

[qvk] available surface formats:
[qvk] A2R10G10B10_UNORM_PACK32
[qvk] colour space: 0
[qvk] available surface formats:
[qvk] A2R10G10B10_UNORM_PACK32
[qvk] colour space: 0
[qvk] available surface formats:
[qvk] A2R10G10B10_UNORM_PACK32
hanatos commented 1 year ago

thanks for the additional info:

qosch: right, depending on compiler mood it would not have worked at all before this commit. will have to see for myself i suppose.

kanyck: seems amdgpu-pro does not do 30-bit rendering, screenshot is as i thought. you sure upscayl uses vulkan for anything intelligible? looks to me like electron/js + magic binaries to do the actual work?

maybe there's some other setting/feature that amd is generally unhappy about that i can disable.

kanyck commented 1 year ago

seems amdgpu-pro does not do 30-bit rendering, screenshot is as i thought.

Obviously.

you sure upscayl uses vulkan for anything intelligible?

Not a bit 😺 I just know that it demands vulkan to start, so I concluded it does something useful with it. And, it's a strange beast giving questionable results. At least, most of the modes didn't work well for me, and others spoil hues. JFYI.

hanatos commented 1 year ago

fwiw i am interested in doing the inferencing for some specific cnn models in vkdt, mostly for denoising. who knows maybe it works for demosaicing too. i have restrictive ideas about performance constraints though. these huge models look like they would take many seconds if not minutes to evaluate for a large image.

kanyck commented 1 year ago

FWIW you may be interested in this https://www.supergoodcode.com/through-the-loop/

hanatos commented 1 year ago

confirmed. related to https://gitlab.freedesktop.org/mesa/mesa/-/issues/6702 and fixable on my end, see a65d28cc. please pull and test again!

kanyck commented 1 year ago

Looks like it's fixed! Will do some more testing, but for now I can't see any artifacts. Thanks!