Persistently map GPU cache data

zakorgy commented 5 years ago

Fixes #271. Right now as a first step the GpuCache texture is replaced with a Buffer. Upon updating the GPU cache we create a writer for the Buffer and update the buffer data directly. The next step would be keeping that writer alive, for that we need to figure out how to handle the lifetimes properly.

kvark commented 5 years ago

The next step would be keeping that writer alive, for that we need to figure out how to handle the lifetimes properly.

That would be trivial with raw mapping, which doesn't have lifetimes.

zakorgy commented 5 years ago

@kvark I've realized the same issue yesterday but couldn't have phrased as good as you did. I was thinking what could we do in the part what you mentioned as or do something smart, but these solutions uses multiple submits, and don't know how big overhead these would be:

Submit the commend buffer after the buffer copy with a fence and a semaphore. We can wait on the fence before the first write on the new buffer and wait on the semaphore on the next submit.
The second one would be call submit after each draw_tile_frame, and we would call prepare_gpu_cache for the next frame (instead of the current one) after the draw, but before the submit. This could make the cache buffer visible for the next gpu cache update, but again we could end up with a lot of submit.

For now I will go back to the approach we previously had, just wanted to share my ideas with you.

zakorgy commented 5 years ago

@kvark I've reverted to CPU-side copy on resize, and handling non-coherent GPU cache buffers as well. I assume most of your previous review comments are related to the buffer to buffer copy part, so I will not answer them one by one. The only thing I'm not sure is how could we use rendy-memory for the new buffery type. I can see that the Download MemoryUsage would be good for our needs but as I can see the allocator not always chose the best suitable memory type if there isn't enough available space with that type: https://github.com/amethyst/rendy/blob/63e1dc2264808378f9664f999e942689322018f8/memory/src/heaps/mod.rs#L133. In this case we can't depend on the information that the device has the suitable memory type, we would need some feedback from the allocator that the chosen type is coherent or not. And I can't find such an information

zakorgy commented 5 years ago

Let's just forget the rendy-memory part, I can see that here is a properties function for the allocated memory block, which can give us the required information.

zakorgy commented 5 years ago

@kvark I've added the code to use MemoryBlockfrom rendy-memory for the PMBuffer but now the rendering is broken: The examples doesn't draw anything but there is no validation layer error. When I run wrench I can only see the white window and the following validation layer error:

VUID-VkMappedMemoryRange-size-00685(ERROR / SPEC): msgNum: 0 - vkInvalidateMappedMemoryRanges: Flush/Invalidate size or offset (61145088, 50659328) exceed the Memory Object's upper-bound (327680). The Vulkan spec states: If size is not equal to VK_WHOLE_SIZE, offset and size must specify a range contained within the currently mapped range of memory (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkMappedMemoryRange-size-00685)
    Objects: 1
        [0] 0x1a, type: 8, name: NULL

Which is weird because we don't call vkInvalidateMappedMemoryRanges directly in our code, and I can't see those (61145088, 50659328) values in the mapped ranges as well. I've uploaded the code to a different branch: https://github.com/zakorgy/webrender/commit/aa22b266d18000966cad311a0139384d2b2126d8

kvark commented 5 years ago

I commented on the change you linked to. There is at least one error in flush_xxx invocation

kvark commented 5 years ago

Note: when doing transitions of this buffer, we also need to take the proper ranges into account and specify them for the barriers

zakorgy commented 5 years ago

I commented on the change you linked to. There is at least one error in flush_xxx invocation

Thanks @kvark for pointing me to the wrong offsets, now the redering works, but there are two new validation layer erros, which probably comes from rendy-memory:

VUID-VkMappedMemoryRange-size-00685(ERROR / SPEC): msgNum: 0 - vkInvalidateMappedMemoryRanges: Flush/Invalidate size or offset (61145088, 50659328) exceed the Memory Object's upper-bound (62128128). The Vulkan spec states: If size is not equal to VK_WHOLE_SIZE, offset and size must specify a range contained within the currently mapped range of memory (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkMappedMemoryRange-size-00685)
    Objects: 1
        [0] 0x1a, type: 8, name: NULL

and

ERROR 2019-07-30T14:47:30Z: gfx_backend_vulkan: 
VALIDATION [UNASSIGNED-CoreValidation-MemTrack-InvalidMap (0)] : VkMapMemory: Attempting to map memory on an already-mapped object 0x1b.
object info: (type: DEVICE_MEMORY, hndl: 27)

Probably the second one comes from the fact that we have a bigger chunk allocated which is related to two gpu cache buffers. But I still don't know where and why do we call vkInvalidateMappedMemoryRanges.

kvark commented 5 years ago

IIRC, rendy-memory does the flushes and invalidations for you if you use its mapping API.

zakorgy commented 5 years ago

I've checked the above mentioned errors. So the first one:

VUID-VkMappedMemoryRange-size-00685(ERROR / SPEC): msgNum: 0 - vkInvalidateMappedMemoryRanges: 
Flush/Invalidate size or offset (61145088, 50659328) exceed the Memory Object's upper-bound (62128128).
The Vulkan spec states: If size is not equal to VK_WHOLE_SIZE,
offset and size must specify a range contained within the currently mapped range of memory
(https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkMappedMemoryRange-size-00685)
    Objects: 1
        [0] 0x1a, type: 8, name: NULL

This occurs when we read back the pixels in wrench, and create a reader. I've checked the offset and size values (the error complains about these values) down to the gfx-backend-vulkan level and for me it looks like the size value is wrong in the validation layer error, because the value there is equals with offset+size instead of size.

The second one:

ERROR 2019-07-30T14:47:30Z: gfx_backend_vulkan: 
VALIDATION [UNASSIGNED-CoreValidation-MemTrack-InvalidMap (0)] : VkMapMemory:
Attempting to map memory on an already-mapped object 0x1b.
object info: (type: DEVICE_MEMORY, hndl: 27)

This happens because we map the same memory twice when creating a new storage buffer and the previous mapping is still alive. We can't call unmap memory from the gfx-hal API for the old buffer, because that unmaps the entire memory region which can be related to more buffers.

After these I propose not to use rendy-memory for this special use case. Also on metal we have the following panic with the rendy-memory changes

thread 'main' panicked at 'assertion failed: memory_block.properties().contains(hal::memory::Properties::CPU_CACHED)', webrender/src/device/gfx/device.rs:2002:9

but it is absent without it.

zakorgy commented 5 years ago

@kvark it looks like DX12 is not happy with the shader changes in https://github.com/szeged/webrender/pull/297/commits/881f851e28363e3643644c7c9da36a98c8f07726#diff-46adf8fca809efdc8698e42f69fa2c2bR5

[11912] D3D12 ERROR: ID3D12Device::CreateGraphicsPipelineState: Root Signature doesn't match Vertex Shader: Shader SRV descriptor range (BaseShaderRegister=5, NumDescriptors=1, RegisterSpace=2) is not fully bound in root signature  
[11912]  [ STATE_CREATION ERROR #688: CREATEGRAPHICSPIPELINESTATE_VS_ROOT_SIGNATURE_MISMATCH]

Could be this an issue in gfx-hal ?

kvark commented 5 years ago

Could be this an issue in gfx-hal ?

I'd like to see two things first:

what pipeline layout is being used at this point?
what does the generated HLSL look like? We output it in debug logging level

kvark commented 5 years ago

We can't call unmap memory from the gfx-hal API for the old buffer, because that unmaps the entire memory region which can be related to more buffers.

Can you call unmap() from rendy-memory API? I imagine multiple mappings into the same physical memory is something it has to handle.

zakorgy commented 5 years ago

what pipeline layout is being used at this point?

PipelineLayout { raw: WeakPtr( ptr: 0x1e8d7cbd000 ), tables: [SRV_CBV_UAV | SAMPLERS, SRV_CBV_UAV | SAMPLERS, SRV_CBV_UAV | SAMPLERS, SRV_CBV_UAV], root_constants: [RootConstant { stages: VERTEX, range: 0..17 }], num_parameter_slots: 8 }

And the descriptor set layouts used for the pipeline layout

what does the generated HLSL look like? We output it in debug logging level

Created gists for the vertex and fragment HLSL modules

zakorgy commented 5 years ago

We can't call unmap memory from the gfx-hal API for the old buffer, because that unmaps the entire memory region which can be related to more buffers.

Can you call unmap() from rendy-memory API? I imagine multiple mappings into the same physical memory is something it has to handle.

I didn't find anything like this in rendy-memory. There are two cases for unmap functions in rendy-memory:

unmapping the entire memory for decicated allocations
do nothing with the underlying memory for the rest

kvark commented 5 years ago

Looks like a problem with shader translation. It should be at set=1 binding=5, but the generated shader puts it in set=2 binding=5.

kvark commented 5 years ago

cc @grovesNL @msiglreith ^

msiglreith commented 5 years ago

mmhh I guess this is due to the readonly qualifier. Could you check without it? We might not have enough information on the host side to deal with this

kvark commented 5 years ago

I filed https://github.com/gfx-rs/gfx/issues/2933 with my findings and ideas about a solution to this. Please try @msiglreith 's suggestion in the meantime as a workaround.

zakorgy commented 5 years ago

@kvark @msiglreith removing the readonly qualifier does the trick for d3d12 but gives the following validation layer error with Vulkan:

ERROR 2019-08-02T05:38:28Z: gfx_backend_vulkan:
VALIDATION [UNASSIGNED-CoreValidation-Shader-FeatureNotEnabled (0)] : Shader requires vertexPipelineStoresAndAtomics but is not enabled on the device
object info: (type: UNKNOWN, hndl: 0)

which doesn't seem to have impact on the rendering result, it's just not good to have.

zakorgy commented 5 years ago

@kvark I have updated the PR, everything works fine (fixed the validation layer errors with rendy memory and the wrong chosen memory type with Metal from https://github.com/szeged/webrender/pull/297#issuecomment-516774525) with all backends except the above mentioned Vulkan validation layer error.

zakorgy commented 5 years ago

@kvark I've addressed your last concern (it turned out the issue was on our side, so we are using the mapping routines from rendy-memory), also squashed some commits.

szeged / webrender

Persistently map GPU cache data #297