Closed bjacob closed 3 years ago
@benvanik would it make sense, maybe as part of the ongoing work on memory allocation, to make it so that these small allocations are into a single arena so that on shutdown we just need to deallocate that arena?
It looks like this is taken care of already by @asaadaldien 's https://github.com/google/iree/tree/ataei-pre_allocation:
Baseline: 920 ms With my patch above: 870 ms With Ahmed's pre_allocation: 820 ms With both my patch above and Ahmed's pre_allocation: still 820 ms
Confirm there's indeed nothing more to this / close this issue when pre_allocation gets merged?
Make sense! if free is releasing host buffers here, it will have 0 cost in pre_allocation branch
indeed, that's what free was doing here. please mark your own branch as fixing this issue.
This is expected: it's the other half of the allocations (gotta free your memory sometime) and what I'll be fixing next week by not allocating the memory in the first place :)
Dupe of #1888.
Found this accidentally while staring at Tracy traces (so count this as possibly the first use of Tracy on Android!)
In
hal_module.cc
, inExSubmitAndWait
, the zone labelledHALModuleState::DeferredReleases
is taking 50ms, adding 50ms to the latency reported byiree-benchmark-module
on the perf burndown workload. This diff,improves the reported latency (MobileBert on Pixel4 core 7) from 920 ms down to 870 ms.
The tracy trace shows what's happening: each of the iterations of the
for
loop above is performing somefree()
, which you would think should be cheap, but there are many of them, and on Android,free
callsmadvise
to hint the OS to release memory immediately.What would be an appropriate way to avoid performing so many
free
calls during shutdown?