tensorflow / recommenders-addons

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
Apache License 2.0
596 stars 136 forks source link

[Feat] Add new setting num_of_buckets_per_alloc from HKV bata 12. #433

Closed MoFHeka closed 5 months ago

MoFHeka commented 5 months ago

Description

What's new

Add new setting num_of_buckets_per_alloc from HKV bata 12. It might improve performance of memory access. And this feature also reduce unessential BFC reallocating information to user when CUDA OOM. Try to prevent billion of HKV buckets allocating small piece memory which may make BFC allocator re-chunk frequently. In beta 11, it might print more than 10,000 info. For now, only about 1,000.

Why choose 512

According to https://developer.nvidia.com/blog/improving-gpu-memory-oversubscription-performance/, which said "In our experiments, a memory page is set to be 2 MB, which is the largest page size at which GPU MMU can operate." and "128-byte aligned access ensures that the CPU-GPU link and system DRAM are used efficiently. " And one bucket in HKV is 2048+128 bytes. So a simple calculation, 2MB/(2048+128)B=963.76~=512. We choose 512 as a default num_of_buckets_per_alloc.

BFC allocator would create massive information like these:

2024-06-04 00:47:52.356200: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d827a00 of size 2304 next 24266 2024-06-04 00:47:52.356203: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d828300 of size 2304 next 24267 2024-06-04 00:47:52.356207: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d828c00 of size 2304 next 24268 2024-06-04 00:47:52.356210: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d829500 of size 2304 next 24269 2024-06-04 00:47:52.356214: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d829e00 of size 2304 next 24270

Also [fix] Missing Bucketize class in DE keras horovod demo.

Type of change

Checklist:

How Has This Been Tested?

Run a big model with a big batch size when using HKV.