gpuweb / gpuweb

Where the GPU for the Web work happens!
http://webgpu.io
Other
4.75k stars 314 forks source link

Support "uniform texel buffer" and "storage texel buffer" ? #162

Open dneto0 opened 5 years ago

dneto0 commented 5 years ago

I don't have an opinion on whether these should be supported by WebGPU, and I didn't see an investigation related to this.

devshgraphicsprogramming commented 5 years ago

If they're not supported in WebGPU then the developer will have to emulate them with SSBOs, except that the value-conversion (i.e. RGB9E5 to vec4) will be much slower as opposed to hardware.

Secondly there is an issue with the lack of support for Uniform Texel Buffers (TBO in OpenGL parlance), because in Vulkan an SSBO has to conform to the std430 layout, so supporting formats of less than 4 bytes per "pixel" (or rather sample) will be very painful.

This will be a bit hard to investigate as Vulkan Hardware database has no information about supported formats for Uniform and Storage Texel Buffers as well as the shader stages that support them.

magcius commented 5 years ago

because in Vulkan an SSBO has to conform to the std430 layout

Hopefully not for much longer! https://renderdoc.org/vkspec_chunked/chap41.html#VK_EXT_scalar_block_layout

kvark commented 5 years ago

@devshgraphicsprogramming good point. Btw, the stages at least seem to be well defined by the spec (and thus can be derived from the features in vulkandb).

Uniform texel buffers are exposed to all stages:

Load operations from uniform texel buffers are supported in all shader stages for image formats which report support for the VK_FORMAT_FEATURE_UNIFORM_TEXEL_BUFFER_BIT feature bit via vkGetPhysicalDeviceFormatProperties in VkFormatProperties::bufferFeatures.

Storage texel buffers are exposed to the same stages as SSBO/UAV in general:

When the fragmentStoresAndAtomics feature is enabled, stores and atomic operations are also supported for storage texel buffers in fragment shaders with the same set of texel buffer formats as supported in compute shaders. When the vertexPipelineStoresAndAtomics feature is enabled, stores and atomic operations are also supported in vertex, tessellation, and geometry shaders with the same set of texel buffer formats as supported in compute shaders.

kvark commented 5 years ago

Vulkan spec also has the list of formats that are required to support uniform/storage texel buffers in "Required Format Support".

Basically, uniform texel buffers are available for:

Storage texel buffers are available for:

devshgraphicsprogramming commented 5 years ago

Yeah so emulating R8,R8G8, R16, etc. with SSBOs would be a pain.

Basically always read the 32bits, then extract the bit range and optionally normalize. This would be a big perf-loss against native Vulkan or OpenGL.

dneto0 commented 5 years ago

Hopefully not for much longer! https://renderdoc.org/vkspec_chunked/chap41.html#VK_EXT_scalar_block_layout

Yes, that's the bleeding edge of support. Hopefully it's the last time Vulkan has to relax the layout rules!

The flip side is whether a big enough installed base can get that extension in time for WebGPU rollout.

Perf impact: Yes, unfavourably aligned vector accesses will be slower on some hardware, whether or not that extension is supported. Tradeoff is whether the user has to do the load-32-bits-then-unpack vs. the implementation. At least if the implementation has the responsibility, they have the opportunity of doing it better; and your shader code is more clear.

Kangz commented 3 years ago

Tentatively closing. IIRC the group decided to not have these features and nobody asked for them (SSBOs mostly just better though there's some discussion about that in #297)

kvark commented 3 years ago

@Kangz this may become important for OpenGL backends. There are GL platforms that plain refuse to expose any SSBO stuff, but they'd be totally usable through TBOs.

Kangz commented 3 years ago

Do these GL versions support compute shaders? I thought compute shaders always came with SSBOs.

kvark commented 3 years ago

Here is some light reading - https://www.raspberrypi.org/forums/viewtopic.php?t=271863 It talks about GLES-3.1, so it must have the compute stuff.

Kangz commented 3 years ago

This is only for vertex shaders, so not quite as bad but this is a real constraint when targeting ES 3.1. I think it might be possible to magically transform VS using SSBO into a VS using a TBO (similar to the ByteAddressBuffer transform) and then back the TBO with the storage buffer. This would work for readonly-storage-buffer at least (there is no way to implement RW storage buffers in ES 3.1 AFAIK)

magcius commented 3 years ago

You can kind of emulate SSBOs using imageLoadStore, which I believe should be in ES 3.1

dneto0 commented 11 months ago

The Chrome team is getting multiple requests for texel buffers.

Motivations include:

dneto0 commented 11 months ago

Additionally, storage texture coordinates are limited.
WebGPU mandates supporting 1D and 2D coordinates up to 8192.

Texel buffers are not limited in this way. They would be limited by the size of the underlying GPUBuffer.

dneto0 commented 11 months ago

Moved to Milestone 2, per discussion in WebGPU API meeting 2023-09-27

devshgraphicsprogramming commented 11 months ago

because in Vulkan an SSBO has to conform to the std430 layout

Hopefully not for much longer! https://renderdoc.org/vkspec_chunked/chap41.html#VK_EXT_scalar_block_layout

can confirm that finally in 2023 all relevant (still supported) desktop platforms support scalar block layout, as for mobile... I cannot say

kdashg commented 9 months ago
WGSL 2023-12-05 Minutes * AB: We have partner requests for this. It comes from various places. Fills gap on older hardware that doesn't have smaller data type support. Very common in older D3D to get read write storage of f16, say, without needing full f16 type support. Widely used. * DN: this lets you treat GPUBuffer memory as a big array of textures. Does not have coordinate size limitations. But use the texel format conversion hardware on the data path into/out of the shader. * JB: Learning about this feature. Don’t express an opinion. * AB: Certain use cases can be satisfied with f16. There are other use cases to help porting existing code. There’s also a reach aspect. E.g. D3D doesn’t have f16 until a later SM 6.x but prior APIs have texel buffers which is how you end up covering for it. * JB: Sounds like some folks really need it but many don’t care. * KG: Estimates on how much work this is to add? * DN: Mostly just plumbing, so should be easy enough. * JB: Ok! Mozilla is neutral. M1 is tolerable but not preferred. * **->M1**
jimblandy commented 3 months ago

It would be nice to know how widely this feature is supported on our target hardware, to decide whether this would be a feature or not optional.

Concretely, what would this look like in the API and in WGSL?

kdashg commented 3 months ago
WGSL 2024-05-14 Minutes * KG: This is showing up now, why? Kai was going through old issues and adding tags. This touches on WGSL. We last talked about it in WGSL in Dec 2023. At the time we put it in M1. As we talked in Mozilla, it would be nice to know how widely supported it is; is it optional, and understand the API side implications. On the WGSL side is there another address space. * AB: It’s widely supported. We’re ok moving this to M2. It would be a new texture type, with storage in the ‘handle’ space. * KG: Seems an API proposal stage. * AB: We have an internal doc we can expose for the investigation. * DN: This *is* requested by partners, has real demand. But, this is lower in priority than e.g. subgroups, and we don’t expect to implement these new until much later this year. * MM: is this implementable on metal? * AB: i think it's just texture_buffer in metal * MW: All Metal devices should be able to support this. * DN: I updated to Milestone 2.
litherum commented 3 months ago

The Chrome team is getting multiple requests for texel buffers.

Is a texel buffer a buffer that is backed by texture data, or a texture that is backed by buffer data?

If I'm not mistaken, Metal only supports the latter.

Kangz commented 3 months ago

I believe that Metal texture_buffer is the exact equivalent to texel buffers. The MSL specification says:

A texture buffer is a texture type that can access a large 1D array of pixel data and perform dynamic type conversion between pixel formats on that data with optimized performance. Texture buffers handle type conversion more efficiently than other techniques, allowing access to a larger element count, and handling out-of-bounds read access.

mwyrzykowski commented 3 months ago

That's right, Metal texture buffers, MTLTextureTypeTextureBuffer, are 1D textures which don't support mipmaps or texture arrays.

There are also buffer backed 2D textures which can be created from -[MTLBuffer newTextureWithDescriptor:]

In both cases, no support for mipmaps, array length is always 1, sample count is always 1, no support for compressed formats, maybe I am forgetting something.

litherum commented 3 months ago

That's right, Metal texture buffers, MTLTextureTypeTextureBuffer, are 1D textures which don't support mipmaps or texture arrays.

BTW, from the MSL spec in section 2.9.1 Texture Buffers, it says:

However, you cannot sample a texture buffer.

litherum commented 3 months ago

Ability to read and write to individual scalars from large arrays or rectangular grids of data.

Re: "rectangular grids," does Vulkan support that? It appears the way to create a uniform texel buffer or a storage texel buffer is by creating a VkBufferView, but VkBufferViewCreateInfo doesn't include any fields regarding the Y dimension. It appears only capable of creating 1-dimensional textures.

litherum commented 3 months ago

I think the use case of "be able to have different threads write to adjacent bytes without racing" is a compelling use case.

litherum commented 3 months ago

Exploit performance differences between memory types.

I was interested in characterizing this, so I wrote a little benchmark to see what the performance difference was in Metal.

The benchmark is straightforward:

Here are the performance results on an M2 MacBook Air:

Surprisingly, this shows that reading from the raw buffer is faster than reading from a texture buffer. If I were to guess, I'd bet the "Random Read Texture" and "Random Read Buffer" bars are different because memory accesses are fast enough that even the small amount of ALU to reverse the bits of the index isn't being hidden. Though this couldn't explain the entire difference, because the amount of ALU on both of the "Random Read" bars is the same, but the perf difference between them and their "Sequential Read" counterparts is not the same.

Just for fun, here are the results on an AMD Radeon PRO W6800X:

Here, when reading sequentially, the buffer is faster than the texture buffer, but when reading randomly, they're about the same. For the random reads, this is what I expected - I'd expect random accesses to defeat any difference in caching.

Anyhow, despite these disappointing results, I still think there are valid use cases beyond simply read performance in shaders, so adding texture buffers to WebGPU would still make sense.

litherum commented 3 months ago

Another interesting tidbit: in Vulkan, not every pixel format can be used as a uniform/storage texel buffer. And, the spec doesn't list which formats can be used (and which cannot be used). Instead, the spec says:

If Vulkan 1.3 is supported or the VK_KHR_format_feature_flags2 extension is supported, then the buffer view’s set of format features is the value of VkFormatProperties3::bufferFeatures found by calling vkGetPhysicalDeviceFormatProperties2 on the same format as VkBufferViewCreateInfo::format.

So, you have to ask at runtime whether or not the format is compatible with uniform/storage texel buffers, and different devices can return arbitrarily different answers for any given format.

(Aside: I have no idea how Vulkan native app developers could possibly use such an API effectively. What are you supposed to do if the device you're on just happens to not support the codepath you wrote? Write a codepath for literally every format? Fall back to buffers, and hope the device has StorageBuffer8BitAccess?)

sebbbi commented 3 months ago

In DirectX, read-only buffer (Buffer) is supported for all types. Texel buffers all use opaque descriptor in DirectX. The shader doesn't know the buffer type. Sampler hardware does the type conversion. Texel buffers and textures are bound using the same SRV (shader resource view) mechanism.

DirectX read-write buffers RWBuffer are different. They are bound using the UAV (unordered access view) mechanism. In DX11 they were bound using the same API call as render targets. DX12 has separate UAV descriptors and SRV descriptors. UAVs have format limitations. And there's multiple tiers of "typed UAV load" support: https://learn.microsoft.com/en-us/windows/win32/direct3d12/typed-unordered-access-view-loads

UAV is used for read-write textures and buffers. The same buffer byte storage can support views of multiple buffer types, including different typed UAVs, SRVs and byteaddressbuffers. In DX11 structured buffer views are not compatible with other buffer views, since hardware is allowed to implement AoSoA swizzle internally for them. So their memory layout might not match linear buffer layout of other buffer types. Structured buffer functionality appears closest to Vulkan SSBOs, but structured buffers are also bound with opaque descriptor in DX11. In Vulkan SSAO bindings are just a 64 bit pointer to raw memory. 1D textures are not the same as texel buffers in DirectX. They could have different memory layout, and you can't create 1d texture view and typed buffer view to the same data.

sebbbi commented 3 months ago

Typed UAV Load (extended formats) support can be found in the following table: https://en.wikipedia.org/wiki/Feature_levels_in_Direct3D

Nvidia Fermi (GTX 500) and Kepler (GTX 600 and 700) don't suppor it and Intel Gen 7.5 (Haswell) and Gen 8 (Broadwell) don't support it. All other DX12 capable hardware supports it.

According to latest Steam HW survey Fermi GPUs and Broadwell GPUs are no longer used.

Kepler (GTX 700 series) total usage = 0.71% Intel Haswell usage = 0.40%

Total 1.11% Steam users with GPU that doesn't support Typed UAV Load (extended formats).

https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam

Kangz commented 3 months ago

That it is an optional feature in D3D12 and Vulkan mean that this will need to be an extension. To find which format support that feature in Vulkan (and even D3D12), we can volunteer to add some stat gathering in Chromium to see what several options would give in terms of reach in the wild.