pre-allocate and deallocate device memory to trigger BFC allocator chunking before real HKV table create.
Try to prevent billion of HKV buckets allocating small piece memory which may make BFC allocator re-chunk frequently.
BFC allocator would create massive information like these:
2024-06-04 00:47:52.356200: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d827a00 of size 2304 next 24266
2024-06-04 00:47:52.356203: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d828300 of size 2304 next 24267
2024-06-04 00:47:52.356207: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d828c00 of size 2304 next 24268
2024-06-04 00:47:52.356210: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d829500 of size 2304 next 24269
2024-06-04 00:47:52.356214: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d829e00 of size 2304 next 24270
Description
pre-allocate and deallocate device memory to trigger BFC allocator chunking before real HKV table create. Try to prevent billion of HKV buckets allocating small piece memory which may make BFC allocator re-chunk frequently.
BFC allocator would create massive information like these:
Type of change
Checklist:
How Has This Been Tested?
Set a appropriate max_hbm_for_vectors parameter when use HKV.