This is a the fourth chunk of #12318. We need the allocator to cache allocations on the device as allocation is too expensive to do on the fly in every collective operation.
This allocator is based on the existing bucket allocator but stores the bucket information separately instead of in front of the allocated memory chunk. The allocator uses a hash table to store the metadata and caches allocations. By default, chunks between 4K and 1G are cached. MCA variables can be used to configure these thresholds.
This is a the fourth chunk of #12318. We need the allocator to cache allocations on the device as allocation is too expensive to do on the fly in every collective operation.
This allocator is based on the existing bucket allocator but stores the bucket information separately instead of in front of the allocated memory chunk. The allocator uses a hash table to store the metadata and caches allocations. By default, chunks between 4K and 1G are cached. MCA variables can be used to configure these thresholds.