Open Aeranthes opened 1 month ago
from what I understand the recorded timings in the xnnpack events are ticks, and the above can be interpreted as nanoseconds. But I think @Olivia-liu @GregoryComer can speak more on this, and any plans on unifying the timing units.
You can write a custom time converter like this
def convert_xnnpack_delegate_time(
event_name: Union[str, int], input_time: Union[int, float]
) -> Union[int, float]:
return input_time / (1000 * 1000)
And pass the convert to the Inspector like this
inspector = Inspector(
etdump_path=args.etdump_path,
etrecord=args.etrecord_path,
debug_buffer_path=args.debug_buffer_path,
source_time_scale=TimeScale(args.source_time_scale),
target_time_scale=TimeScale(args.target_time_scale),
delegate_time_scale_converter=convert_xnnpack_delegate_time,
)
This should be work.
Thanks for the prompt response.
From the structure of the profiling output I generally see that a block of delegate calls is followed by a "DELEGATE_CALL" event in the profiling. Maybe I'm mistaken about this representing the time spent in the delegate from Executorch's side.
If I add up the delegate profiling times for the inference and interpret them as nanoseconds, the preceding block of XNNPack operators' overall time is quite different to the time given by DELEGATE_CALL (20.25ms vs 28.1ms).
This ~40% missing inference time from the overall method::execute time was one of the reasons I thought that I couldn't interpret the results as nanoseconds.
Hey @Aeranthes thanks again for your question! Always glad to see someone is trying out the tool :). I think there's some overhead of making the delegate call, so that's why the total time in greater than the sums of the operators. @cccclai do you have any knowledge on this?
Method::execute
is the total time for all DELEGATE_CALL
and OPERATOR_CALL
. I don't think delegate call itself will have 8 ms overhead.
It should be correct to interpret those values of the delegated operators to be nanoseconds for xnnpack. We're working on making the time units consistent for delegates in the Inspector output. The 20.25ms vs 28.1ms gap is indeed weird. There might be some bugs unrelated to the time scale that's causing it. I don't think you did anything wrong since I can reproduce this on my side as well.
Hi again - I'm think this is probably setup-agnostic, but as quick rundown, I'm running on an Android device with SDK enabled. I've got a binary application that can generate an etdump file while registering the XNNPack delegate, then I'm running a version of mobilenet_v2 that has been partitioned for the same delegate.
When I print out the resulting event blocks to find the operator timings, I'm seeing the following pattern:
The subject of the issue is specifically that the time format seems inconsistent - the Executorch calls and column headers are recorded in milliseconds, while from what I gather from XNNProfiler.cpp, the delegate calls are recorded in "PAL ticks" / system ticks?
Is this a bug, and/or is there a way to get the output in a more comparable time unit? Thanks for any assistance.