halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.88k stars 1.07k forks source link

Vulkan backend bug in region_allocator #8079

Open immortalsalomon opened 8 months ago

immortalsalomon commented 8 months ago

Hi all, I think I have found a bug in the region_allocator used by Vulkan.

In short, the bug is created by a call to the can_split(const BlockRegion *block_region, size_t size) method which, instead of checking the size returned by the conform_size(size_t offset, size_t size, size_t alignment, size_t nearest_multiple) method, uses the size set in the MemoryRequest. This can cause a problem if the free region returned by the method find_block_region has the same size as the value returned by the conform_size method. Since the request->size value is used this case is not recognised and leads to the creation of a memory region with a theoretical size of zero. When a new memory region is created, the conform_size method is used to determine its size. The method will not return 0 but the nearest multiple of the properties.nearest_multiple value starting from the actual_alignment value passed to the method. This happens do to this check.

This creates the problem that if all regions in a block are free and the coalesce_block_regions method is called, the size of the resulting region exceeds the size of the block.

image This also leads to the following error: image

The fix I made is as follows. I have not tested it yet. Original:

    BlockRegion *block_region = find_block_region(user_context, request);
    if (block_region == nullptr) {
#ifdef DEBUG_RUNTIME_INTERNAL
        debug(user_context) << "RegionAllocator: Failed to locate region for requested size ("
                            << (int32_t)(request.size) << " bytes)!\n";
#endif
        return nullptr;
    }
    if (can_split(block_region, request->size)) {
#ifdef DEBUG_RUNTIME_INTERNAL
        debug(user_context) << "RegionAllocator: Splitting region of size ( " << (int32_t)(block_region->memory.size) << ") "
                            << "to accomodate requested size (" << (int32_t)(request.size) << " bytes)!\n";
#endif
        split_block_region(user_context, block_region, request.size, request.alignment);
    }

Fixed:

    BlockRegion *block_region = find_block_region(user_context, request);
    if (block_region == nullptr) {
#ifdef DEBUG_RUNTIME_INTERNAL
        debug(user_context) << "RegionAllocator: Failed to locate region for requested size ("
                            << (int32_t)(request.size) << " bytes)!\n";
#endif
        return nullptr;
    }

    // Need to check if empty region is not 0
    actual_size = conform_size(block_region->memory.offset, request.size, actual_alignment, block->memory.properties.nearest_multiple);

    if (can_split(block_region, actual_size)) {
#ifdef DEBUG_RUNTIME_INTERNAL
        debug(user_context) << "RegionAllocator: Splitting region of size ( " << (int32_t)(block_region->memory.size) << ") "
                            << "to accomodate requested size (" << (int32_t)(request.size) << " bytes)!\n";
#endif
        split_block_region(user_context, block_region, request.size, request.alignment);
    }
derek-gerstmann commented 8 months ago

Thanks very much for the bug report! I believe you've found the root cause for this lingering issue that triggers this validation error that I hadn't been able to reproduce very easily. I'll take a look and investigate!

immortalsalomon commented 8 months ago

Truly happy to help. You have done an extraordinary job :).

I found another trigger that generates the same validation error in Vulkan. The cause is the value of actual_alignment and the value of properties.nearest_multiple. I am testing my code on two graphics cards. One is an NVidia where the actual_alignment is 16 Bytes and the other is an intel integrated graphics card where the actual_alignment is 64 Bytes. With the former the error is not thrown but with the latter it is. This is because vulkan's vkGetBufferMemoryRequirements() method returns the size value which is a near multiple of 64 and not 32.

image

1° => 63797472 = conform_size(0, 63797448, 64, 32); 2° => 63797504 = conform_size(0, 63797448, 64, 64);

I think there are two possible solutions. The quickest is to change the value of nearest_multiple from 32 to 64 Bytes. The other would be to obtain the actual_alignment during initialisation and adjust the value of properties.nearest_multiple.

abadams commented 8 months ago

Was closing this intentional? Seems like it's still active from the comments so far.

derek-gerstmann commented 8 months ago

Ahh, thanks so much for the detailed report! I've spent some time working on a fix and adding more test coverage. For the device specific alignment and vkGetBufferMemoryRequirements, I agree we should be querying the value during initialization. I'll add that as well.

derek-gerstmann commented 8 months ago

@immortalsalomon I created a new pull request #8087 which should resolve these issues. Please give this a try!

immortalsalomon commented 8 months ago

@immortalsalomon I created a new pull request https://github.com/halide/Halide/pull/8087 which should resolve these issues. Please give this a try!

I tried the fixes and they work for the NVidia graphics card (16bytes alignment). For the intel graphics card (64bytes alignment) the fixes created another problem. below is the log:

image image
derek-gerstmann commented 8 months ago

Hmm, strange! I'll try and set thing up on my laptop with an integrated Intel GPU and see if I can diagnose what's going on. Which graphics card are you using?

immortalsalomon commented 8 months ago

Hmm, strange! I'll try and set thing up on my laptop with an integrated Intel GPU and see if I can diagnose what's going on. Which graphics card are you using?

Sorry I was in a bit of a hurry yesterday. Unfortunately I don't have time at the moment to check the problem myself. I found a temporary solution by setting the value of the nearest_multiple parameter to 64Bytes.

The specifications of my computer: CPU: 12th Gen Intel(R) Core(TM) i7-12850HX RAM: 32GB Integrated GPU: Intel(R) UHD Graphics Dedicated GPU: NVIDIA RTX A2000 8GB Laptop GPU

Thank you for investigating. :)

derek-gerstmann commented 7 months ago

@immortalsalomon Thanks for your patience. Please give https://github.com/halide/Halide/pull/8130 a try when you have a chance!