tf-encrypted / moose

Secure distributed dataflow framework for encrypted machine learning and data processing
Apache License 2.0
58 stars 16 forks source link

Memory optimization of networking pass #979

Closed mortendahl closed 2 years ago

mortendahl commented 2 years ago

Closes https://github.com/tf-encrypted/runtime/issues/964.

We see both runtime improvements (~17s vs ~20s) but more importantly memory improvements (5.3GB vs 6.2GB peak).

mortendahl commented 2 years ago

Benchmarks using s3://model-experiments/logical_textual_XGBRegressor_100feature_1output_100estimators.moose and

elk compile logical.moose networking.moose -p typing,lowering,prune,networking

Before (1ec90c6f8aba440ed136ed1025825213389a90d4):

# time

18.64s user 3.30s system 107% cpu 20.328 total
19.23s user 3.13s system 107% cpu 20.750 total
19.18s user 3.26s system 107% cpu 20.821 total

# heaptrack

bytes allocated in total (ignoring deallocations): 22.25GB (127.02MB/s)
calls to allocation functions: 279273575 (1594327/s)
temporary memory allocations: 58070536 (331515/s)
peak heap memory consumption: 6.19GB
peak RSS (including heaptrack overhead): 83.79GB

bytes allocated in total (ignoring deallocations): 22.25GB (124.99MB/s)
calls to allocation functions: 279273575 (1568864/s)
temporary memory allocations: 58070305 (326219/s)
peak heap memory consumption: 6.19GB
peak RSS (including heaptrack overhead): 83.64GB

bytes allocated in total (ignoring deallocations): 22.25GB (126.38MB/s)
calls to allocation functions: 279273575 (1586394/s)
temporary memory allocations: 58069429 (329859/s)
peak heap memory consumption: 6.19GB
peak RSS (including heaptrack overhead): 83.59GB

After (04cc4a6fd636d8e3e8f1c7806fa341b1e9db1b5f):

# time

16.05s user 2.79s system 109% cpu 17.224 total
15.95s user 2.77s system 109% cpu 17.087 total
16.56s user 2.99s system 108% cpu 17.938 total

# heaptrack

bytes allocated in total (ignoring deallocations): 18.99GB (114.38MB/s)
calls to allocation functions: 258900698 (1559248/s)
temporary memory allocations: 58070731 (349735/s)
peak heap memory consumption: 5.31GB
peak RSS (including heaptrack overhead): 79.02GB

bytes allocated in total (ignoring deallocations): 18.99GB (114.66MB/s)
calls to allocation functions: 258900698 (1563023/s)
temporary memory allocations: 58068539 (350568/s)
peak heap memory consumption: 5.31GB
peak RSS (including heaptrack overhead): 79.01GB

bytes allocated in total (ignoring deallocations): 18.99GB (112.51MB/s)
calls to allocation functions: 258900698 (1533790/s)
temporary memory allocations: 58069882 (344019/s)
peak heap memory consumption: 5.31GB
peak RSS (including heaptrack overhead): 79.02GB