Closed dyastremsky closed 5 months ago
Great work David! Only some minor tweaks and clarifications.
Just a thought, should we be setting a default value for the required arguments? For example, the profile
subcommand requires concurrency
as a parameter. During the arg sanitation process, would it make sense to add a default argument for it if the user didn't provide it. I think this would abstract some complexity away from the user.
CC: @rmccorm4 for thoughts.
Great work David! Only some minor tweaks and clarifications.
Just a thought, should we be setting a default value for the required arguments? For example, the
profile
subcommand requiresconcurrency
as a parameter. During the arg sanitation process, would it make sense to add a default argument for it if the user didn't provide it. I think this would abstract some complexity away from the user.CC: @rmccorm4 for thoughts.
I think we're 100% going for a passthrough approach here. This makes the Triton CLI extensible and maintains all logic unique to the tools in their own repositories. If we start moving over logic, then responsibilities are starting to overlap and we could be duplicating code. If there is a required arg for profile and we think it should set a default, Model Analyzer is the right place to fix that.
This work is currently on hold. I will comment the ticket once that status changes.
@rmccorm4 @fpetrini15 This should now be ready, but it is failing the Python 3.8 test due to types not yet been subscriptable until 3.9. While I could go into GenAI-Perf and make it Python 3.8-friendly, given that's not a requirement of GenAI-Perf, something will likely break again in the future. If we want this to pin the 24.04 version, this will also delay release until there is a release branch with the changes in GenAI-Perf.
@rmccorm4 @fpetrini15 This should now be ready, but it is failing the Python 3.8 test due to types not yet been subscriptable until 3.9. While I could go into GenAI-Perf and make it Python 3.8-friendly, given that's not a requirement of GenAI-Perf, something will likely break again in the future. If we want this to pin the 24.04 version, this will also delay release until there is a release branch with the changes in GenAI-Perf.
I tried creating a branch with types that work with Python 3.8 (here), but then it's running into issues with not having the built-in library itertools available in 3.10. I'd need to go back and retrofit code. Is Python 3.8 support a requirement for Triton CLI? It seems for this integration, we may need to either make it a requirement for GenAI-Perf or not a requirement for either.
CC: @debermudez @nnshah1
I don't think it's a strong requirement at this time, probably more of a "Nice to have". I added it to CLI because it was easy when starting from scratch and I believe PyTriton supports 3.8+
I don't mind removing the support for now if it's not a simple fix to support or just unwanted.
I don't think it's a strong requirement at this time. I added it to CLI because it was easy and I believe PyTriton supported 3.8+
I don't mind removing the support for now if it's not a simple fix to support or just unwanted.
If you're okay with it, I pushed a commit dropping it. I understand the desire for greater support though, so if we wanted to support 3.8-3.9, it would just require updating GenAI-Perf to remove the 3.10+ features and then move testing to 3.8. I'll start a conversation about it in the morning.
As per the offline discussion, I have removed the 3.8 test for now.
@rmccorm4 @fpetrini15 This is ready for another round of review.
This pull request makes it so that Triton CLI calls GenAI-Perf for its profile subcommand. All arguments get passed through to GenAI-Perf.
As part of these changes, the previous profiler functionality has been removed from Triton CLI to avoid maintaining this behavior in both places.
Unit tests have successfully passed with these changes in place in an environment with PA and GenAI-Perf.
GenAI-Perf demo below. _Apologies for the extra errors beforehand from some arg types. Note: This was a previous iteration,
--task-llm
is no longer used. The reason the output tokens are off is because the mock Python model doesn't actually use the maxtoken inputs, it just returns the input as output. https://github.com/triton-inference-server/triton_cli/assets/58150256/97056e68-afb8-49e9-9e07-8f6601952c3a