Closed dyastremsky closed 1 month ago
There are also potential documentation changes needed?
I'm also concerned about the fact that we still are reporting latency numbers. We may need a way to indicate that the latency is not stable
code looks correct. would be nice to have some sort of testing to confirm and protect.
Correct in what way? You can see the screenshots above. I'll also be adding testing to ensure PA stabilizes to the main feature branch as a separate PR.
I'm also concerned about the fact that we still are reporting latency numbers. We may need a way to indicate that the latency is not stable
Do you think this is something users care about? Will defer to you and @nnshah1 here. It seemed like even our team wasn't super familiar with what stabilization means, so I'm not sure that it's important enough to call out that latency may not have stabilized.
As a counterargument to the point about users not caring, we know that uncapping the request rate in this way can lead to unbounded growth so average latency won't be the most accurate measure. But we do show the latency per run, so users can see that.
There are also potential documentation changes needed?
Thanks for this callout. I'll add these changes.
I'm also concerned about the fact that we still are reporting latency numbers. We may need a way to indicate that the latency is not stable
Do you think this is something users care about? Will defer to you and @nnshah1 here. It seemed like even our team wasn't super familiar with what stabilization means, so I'm not sure that it's important enough to call out that latency may not have stabilized.
As a counterargument to the point about users not caring, we know that uncapping the request rate in this way can lead to unbounded growth so average latency won't be the most accurate measure. But we do show the latency per run, so users can see that.
My preference would be to handle this in documentation and not tool output.
If I understand correctly - we are saying that for cases were request rate >> throughput - latency is effectively unbounded - I think that's reasonable to document. Probably we want to add (if not present) the ability to sweep - i.e. start at low request rate and increase until throughput no longer increases - then stabilize latency at that request rate ....
code looks correct. would be nice to have some sort of testing to confirm and protect.
Correct in what way? You can see the screenshots above. I'll also be adding testing to ensure PA stabilizes to the main feature branch as a separate PR.
I'm just commenting that the code changes look fine/correct to me.
I'm also concerned about the fact that we still are reporting latency numbers. We may need a way to indicate that the latency is not stable
Do you think this is something users care about? Will defer to you and @nnshah1 here. It seemed like even our team wasn't super familiar with what stabilization means, so I'm not sure that it's important enough to call out that latency may not have stabilized. As a counterargument to the point about users not caring, we know that uncapping the request rate in this way can lead to unbounded growth so average latency won't be the most accurate measure. But we do show the latency per run, so users can see that.
My preference would be to handle this in documentation and not tool output.
If I understand correctly - we are saying that for cases were request rate >> throughput - latency is effectively unbounded - I think that's reasonable to document. Probably we want to add (if not present) the ability to sweep - i.e. start at low request rate and increase until throughput no longer increases - then stabilize latency at that request rate ....
I'll add a message
code looks correct. would be nice to have some sort of testing to confirm and protect.
Correct in what way? You can see the screenshots above. I'll also be adding testing to ensure PA stabilizes to the main feature branch as a separate PR.
I'm just commenting that the code changes look fine/correct to me.
Ah, gotcha. Apologies, I misread that earlier. Thanks! 😄
I'm also concerned about the fact that we still are reporting latency numbers. We may need a way to indicate that the latency is not stable
Do you think this is something users care about? Will defer to you and @nnshah1 here. It seemed like even our team wasn't super familiar with what stabilization means, so I'm not sure that it's important enough to call out that latency may not have stabilized. As a counterargument to the point about users not caring, we know that uncapping the request rate in this way can lead to unbounded growth so average latency won't be the most accurate measure. But we do show the latency per run, so users can see that.
My preference would be to handle this in documentation and not tool output.
If I understand correctly - we are saying that for cases were request rate >> throughput - latency is effectively unbounded - I think that's reasonable to document. Probably we want to add (if not present) the ability to sweep - i.e. start at low request rate and increase until throughput no longer increases - then stabilize latency at that request rate ....
Created a ticket to look into sweeping the request rate (TPA-140).
I added a warning message when the latency is not stabilizing, as discussed (see PR description). I also added a unit test. I can look at adding more extensive testing and the 0 sec avg HTTP time bug as separate PRs into the feature branch.
This pull request is ready for review.
After fixing the async request rate case, perf analyzer no longer stabilizes most of the time for smaller models due to the latency being more variable in the async case. As a result, only use throughput (inferences per second) for stabilizing. PA quite quickly with these updates.
Before changes: DenseNet
After changes: DenseNet