Open Jonas-Heinrich opened 1 year ago
After further investigation, I noticed that the issue is also present in other locations. The second force push now includes other locations (found by searching for core_sw::dispatcher::kernels_dispatcher::get_instance()
).
Hi @Jonas-Heinrich thank you for the contribution! Team would review and do a thorough testing on our side in the upcoming weeks.
Hi,
while comparing QPL to TUM Umbra's related functionality, I noticed that the former has a large overhead for small amounts of tuples. After investigating with VTune and the following microbenchmark, I think I found a typo that leads to an accidental copy of the function pointer table in
input_stream_t::initialize_sw_kernels
. Here's the benchmark:The benchmark was run for 15s on an i9 13900K. Screenshot of VTune summary before the PR:
after the PR: