openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators
Apache License 2.0
2.61k stars 406 forks source link

[xla:ffi] Optimize CallFrame construction #14245

Closed copybara-service[bot] closed 3 months ago

copybara-service[bot] commented 3 months ago

[xla:ffi] Optimize CallFrame construction


Benchmark Time CPU Iterations

BM_AddBufferArg/1 130 ns 130 ns 5371866 BM_AddBufferArg/2 193 ns 193 ns 4185683 BM_AddBufferArg/4 244 ns 244 ns 2814702 BM_AddBufferArg/8 394 ns 394 ns 1777035 BM_AddBufferArg/16 758 ns 758 ns 923284 BM_AddAttributes/1 236 ns 236 ns 2968503 BM_AddAttributes/2 329 ns 329 ns 2121776 BM_AddAttributes/4 549 ns 549 ns 1274770 BM_AddAttributes/8 1029 ns 1029 ns 682219 BM_AddAttributes/16 2315 ns 2315 ns 302339

FUTURE_COPYBARA_INTEGRATE_REVIEW=https://github.com/openxla/xla/pull/14103 from shraiysh:enable_send_recv_validation 3a9628713c49d0966fae4fb15484762e19133435