openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators
Apache License 2.0
2.39k stars 356 forks source link

[xla:ffi] Add an API to update CallFrame in place #14263

Closed copybara-service[bot] closed 3 days ago

copybara-service[bot] commented 3 days ago

[xla:ffi] Add an API to update CallFrame in place

Instead of creating a call frame copy for each concurrent execute request it might be more efficient to keep a pool of call frames guarded with a mutex and update them using round robin strategy.


Benchmark Time CPU Iterations

BM_UpdateCallFrame/1 86.5 ns 86.5 ns 7826900 BM_UpdateCallFrame/2 93.2 ns 93.2 ns 7728892 BM_UpdateCallFrame/4 102 ns 102 ns 6898289 BM_UpdateCallFrame/8 119 ns 119 ns 6066828 BM_UpdateCallFrame/16 164 ns 164 ns 4245659 BM_UpdateCallFrame/32 233 ns 233 ns 2977063 BM_UpdateCallFrameInPlace/1 4.28 ns 4.28 ns 163073438 BM_UpdateCallFrameInPlace/2 4.69 ns 4.69 ns 149033865 BM_UpdateCallFrameInPlace/4 5.09 ns 5.09 ns 137857455 BM_UpdateCallFrameInPlace/8 7.28 ns 7.28 ns 96355198 BM_UpdateCallFrameInPlace/16 11.3 ns 11.3 ns 62005774 BM_UpdateCallFrameInPlace/32 20.6 ns 20.6 ns 33960530