ucbrise / actnn

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
MIT License
196 stars 30 forks source link

how to avoid memory fragmentation in ActNN? #36

Closed Jack47 closed 2 years ago

Jack47 commented 2 years ago

May I known how you guys implemented this defragmentation in ActNN? wecom-temp-63c4a7a412756448a2179e2b801edcd5

In my model training experience: smaller MAX_SPLIT_SIZE, worse performance. bigger MAX_SPLIT_SIZE, will finally result OOM

merrymercy commented 2 years ago

The corresponding code is here https://github.com/ucbrise/actnn/blob/370bc9fe5cd8a64d817d5cc3924c6d3a2051db92/actnn/actnn/conf.py#L26-L31

Jack47 commented 2 years ago

Great thanks, got that. use malloc instead of caching allocator for large size