Open guotuofeng opened 3 years ago
I feel the design is quite comprehensive and it looks very good to me. I have a few questions and suggestions: (1) The "Diff View" section gives a diagram of the timeline views of two example runs. Is there a plan to highlight the modules/operators on the diagram, which have significant differences comparing the baseline run and the experimental run? (2) I didn't find the details of how to show the difference of kernel execution in the "Diff View" section, especially for the timeline view. I feel it will be great to have a way to locate and highlight the differences of kernel executions of two runs, since in many cases ML training and inference performance is mainly determined by how kernels are executed on GPUs. (3) On the "Execution Comparison" chart, what do "Baseline Trend" and "Experimental Trend" mean? How do you expect users to use this feature? (4) For the "Operator/Kernel View" table, can we sort the operators/kernels by their differences between the baseline and experimental runs, and put the ones that has big deltas on the top of the table?
I feel the design is quite comprehensive and it looks very good to me. I have a few questions and suggestions: (1) The "Diff View" section gives a diagram of the timeline views of two example runs. Is there a plan to highlight the modules/operators on the diagram, which have significant differences comparing the baseline run and the experimental run?
In the diff view, we only show the timeline view in module/submodule level. The reason that we don't show the difference between operator level is because there are huge operators to compare which will make the diff view much less useful from big picture view.
(2) I didn't find the details of how to show the difference of kernel execution in the "Diff View" section, especially for the timeline view. I feel it will be great to have a way to locate and highlight the differences of kernel executions of two runs, since in many cases ML training and inference performance is mainly determined by how kernels are executed on GPUs.
for kernel view, the comparison is shown like operator view in table format. There is no plan to show the kernel difference from timeline view since the operator view is not supported either.
(3) On the "Execution Comparison" chart, what do "Baseline Trend" and "Experimental Trend" mean? How do you expect users to use this feature?
The two trends line will be used to show the accumulated execution time at each step. By using this line, user can easily find what's the most time consuming (with the maximum slope in the trend line).
(4) For the "Operator/Kernel View" table, can we sort the operators/kernels by their differences between the baseline and experimental runs, and put the ones that has big deltas on the top of the table?
Yes, each column in the operator/kernel view is sortable.
I feel the design is quite comprehensive and it looks very good to me. I have a few questions and suggestions: (1) The "Diff View" section gives a diagram of the timeline views of two example runs. Is there a plan to highlight the modules/operators on the diagram, which have significant differences comparing the baseline run and the experimental run?
In the diff view, we only show the timeline view in module/submodule level. The reason that we don't show the difference between operator level is because there are huge operators to compare which will make the diff view much less useful from big picture view.
(2) I didn't find the details of how to show the difference of kernel execution in the "Diff View" section, especially for the timeline view. I feel it will be great to have a way to locate and highlight the differences of kernel executions of two runs, since in many cases ML training and inference performance is mainly determined by how kernels are executed on GPUs.
for kernel view, the comparison is shown like operator view in table format. There is no plan to show the kernel difference from timeline view since the operator view is not supported either.
(3) On the "Execution Comparison" chart, what do "Baseline Trend" and "Experimental Trend" mean? How do you expect users to use this feature?
The two trends line will be used to show the accumulated execution time at each step. By using this line, user can easily find what's the most time consuming (with the maximum slope in the trend line).
(4) For the "Operator/Kernel View" table, can we sort the operators/kernels by their differences between the baseline and experimental runs, and put the ones that has big deltas on the top of the table?
Yes, each column in the operator/kernel view is sortable.
Thank you for your explanations!
Sorry, I didn't make myself clear about item (4): my suggestion is to let "Operator/Kernel View" put the ones that has big deltas on the top of the table by default.
I feel the design is quite comprehensive and it looks very good to me. I have a few questions and suggestions: (1) The "Diff View" section gives a diagram of the timeline views of two example runs. Is there a plan to highlight the modules/operators on the diagram, which have significant differences comparing the baseline run and the experimental run?
In the diff view, we only show the timeline view in module/submodule level. The reason that we don't show the difference between operator level is because there are huge operators to compare which will make the diff view much less useful from big picture view.
(2) I didn't find the details of how to show the difference of kernel execution in the "Diff View" section, especially for the timeline view. I feel it will be great to have a way to locate and highlight the differences of kernel executions of two runs, since in many cases ML training and inference performance is mainly determined by how kernels are executed on GPUs.
for kernel view, the comparison is shown like operator view in table format. There is no plan to show the kernel difference from timeline view since the operator view is not supported either.
(3) On the "Execution Comparison" chart, what do "Baseline Trend" and "Experimental Trend" mean? How do you expect users to use this feature?
The two trends line will be used to show the accumulated execution time at each step. By using this line, user can easily find what's the most time consuming (with the maximum slope in the trend line).
(4) For the "Operator/Kernel View" table, can we sort the operators/kernels by their differences between the baseline and experimental runs, and put the ones that has big deltas on the top of the table?
Yes, each column in the operator/kernel view is sortable.
Thank you for your explanations!
Sorry, I didn't make myself clear about item (4): my suggestion is to let "Operator/Kernel View" put the ones that has big deltas on the top of the table by default.
Do you mean that we pin the maximum deltas on top of each column?
I feel the design is quite comprehensive and it looks very good to me. I have a few questions and suggestions: (1) The "Diff View" section gives a diagram of the timeline views of two example runs. Is there a plan to highlight the modules/operators on the diagram, which have significant differences comparing the baseline run and the experimental run?
In the diff view, we only show the timeline view in module/submodule level. The reason that we don't show the difference between operator level is because there are huge operators to compare which will make the diff view much less useful from big picture view.
(2) I didn't find the details of how to show the difference of kernel execution in the "Diff View" section, especially for the timeline view. I feel it will be great to have a way to locate and highlight the differences of kernel executions of two runs, since in many cases ML training and inference performance is mainly determined by how kernels are executed on GPUs.
for kernel view, the comparison is shown like operator view in table format. There is no plan to show the kernel difference from timeline view since the operator view is not supported either.
(3) On the "Execution Comparison" chart, what do "Baseline Trend" and "Experimental Trend" mean? How do you expect users to use this feature?
The two trends line will be used to show the accumulated execution time at each step. By using this line, user can easily find what's the most time consuming (with the maximum slope in the trend line).
(4) For the "Operator/Kernel View" table, can we sort the operators/kernels by their differences between the baseline and experimental runs, and put the ones that has big deltas on the top of the table?
Yes, each column in the operator/kernel view is sortable.
Thank you for your explanations! Sorry, I didn't make myself clear about item (4): my suggestion is to let "Operator/Kernel View" put the ones that has big deltas on the top of the table by default.
Do you mean that we pin the maximum deltas on top of each column?
Yes.
Do you mean that we pin the maximum deltas on top of each column?
Yes.
So, we can sort against the delta column by descending order by default to make the delta on top.
In diff view, we need split run into comparable pieces, during which each piece is align in logical timeline. For example,
I am curious how you will be extracting the comparable pieces exactly. My suggestion is to use markers(something we don't have yet). You can log marker start/end in your workload and all the events in between will be analyzed. os_signpost is an example of this in iOS https://developer.apple.com/documentation/os/3019241-os_signpost.
In diff view, we need split run into comparable pieces, during which each piece is align in logical timeline. For example,
I am curious how you will be extracting the comparable pieces exactly. My suggestion is to use markers(something we don't have yet). You can log marker start/end in your workload and all the events in between will be analyzed. os_signpost is an example of this in iOS https://developer.apple.com/documentation/os/3019241-os_signpost.
We will add module level trace and use the module name for comparison. We plan support only comparison of modules. The reason that there is no support for comparison of operators is the huge amount of operators, which will make the user lost focus.
A sneak peak of this feature implemented in #369
@skyline75489 Hey, what does "execution diff" mean? Also what units is it in? I really think we should have axis labels and/or tool tip.
Profiler differentiate runs/traces feature request
Scenarios
Goals
Here are a couple of typical scenarios that data scientists would like to compare the run after doing some tweak on some baseline model to see whether there are obvious changes in new model.
Non-Goals
Design
The design would cover the major 6 scenarios listed in above section. The plugin UI will be categorized into two modes: normal mode and diff(comparison) mode. In diff mode, the UI would look like the following to allow user to select the runs for comparison.
After users select both baseline and experimental runs and click the diff button, the diff UI will be loaded.
Overview
The overview UI will show some summary information like device/memory, GPU utilization, memory usage, steps time etc.
Diff View
In diff view, we need split run into comparable pieces, during which each piece is align in logical timeline. For example,
We can split two examples runs in above diagrams. For the missing parts, we will leave it alone when do comparison. Note: the functional.relu is only for illustrations purpose. It has the possibility that all functional will be missing. After we align the run execution timeline in logic way, we can compare the absolute execution time for each logical part. Then we can get the following chart . The execution time match is using the critical path time, which means we should use CPU time for CPU tasks, GPU time for GPU task at most cases.
For each part, we can get the following difference line in execution order.
User can zoom in specific align parts by clicking it(exit the zoom by click blank region?). For example, the top module forward can be zoomed at submodule view in recursive way. When user select one block, for e.g. top module.forward, the detail comparison view for the selected blocks will be shown. If there are some gaps between the aligned blocks (for e.g. some unknown code like functional or pure cpu code like time.sleep ), an blank block should be inserted with name like “unknown”, which means the time should not belong to any modules. Note: we only show the diff view for nn.Module level instead of for underlying operator for simplicity purpose, because there are enormous operators which will divert users’ interest. Diff view will cover scenario 1, 3, 4, 6.
Operator/Kernel view
The operator/kernel view will show the operators/kernels summary view for baseline and experimental run. Each column is sortable , filterable. If user select specific blocks, only related stats will be shown.
We can extend the following columns in operator view, each column will have four sub-columns :
Kernel view follows the same pattern. The scenario 2 and 5 can be covered by operator view/kernel view.
Work Items
The following changes or requirements are needed for the diff view features to align the logical timeline.
Open Issues