pythonspeed / filprofiler

A Python memory profiler for data processing and scientific computing applications
https://pythonspeed.com/products/filmemoryprofiler/
Apache License 2.0
834 stars 24 forks source link

Try to reduce memory usage by using nested data structures #50

Open itamarst opened 4 years ago

itamarst commented 4 years ago

The theory: memory tracking overhead mostly matters if you have lots of small allocations. If you have lots of small allocations, they will end up in similar parts of the address space. Thus you could have a nested data structure, HashMap<MostSignificant48bitOfAddressSpace,HashMap<LeastSignificant16bitOfAddress,Allocation>> which would reduce memory overhead since it would effectively compress the addresses.

Potential problems: im's HashMap has ~600 bytes overhead, so if you have a bunch of larger allocations that will increase their overhead dramatically.

So maybe it should be 32bit/32bit, rather than 48bit/16bit, and then the amortized saving will be ~32bits per tracked allocation... unless you're doing pretty massive allocations, in which case im's per-map overhead doesn't matter.

There is also the potential CPU cost, since this will involve two hashmap lookups instead of one. For BTrees, if we switch, this overhead will perhaps matter less?

itamarst commented 4 years ago

Prototype is in branch 50-nested-data-structures. It doesn't actually save memory when used with im's HashMap, but might with OrdMap.