Enable calls to GenAI-Perf for profile subcommand

triton-inference-server / triton_cli

Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.

48 stars 2 forks source link

Enable calls to GenAI-Perf for profile subcommand #52

Closed dyastremsky closed 5 months ago

dyastremsky commented 5 months ago

This pull request makes it so that Triton CLI calls GenAI-Perf for its profile subcommand. All arguments get passed through to GenAI-Perf.

As part of these changes, the previous profiler functionality has been removed from Triton CLI to avoid maintaining this behavior in both places.

Unit tests have successfully passed with these changes in place in an environment with PA and GenAI-Perf.

GenAI-Perf demo below. _Apologies for the extra errors beforehand from some arg types. Note: This was a previous iteration, --task-llm is no longer used. The reason the output tokens are off is because the mock Python model doesn't actually use the maxtoken inputs, it just returns the input as output. https://github.com/triton-inference-server/triton_cli/assets/58150256/97056e68-afb8-49e9-9e07-8f6601952c3a

fpetrini15 commented 5 months ago

Great work David! Only some minor tweaks and clarifications.

Just a thought, should we be setting a default value for the required arguments? For example, the profile subcommand requires concurrency as a parameter. During the arg sanitation process, would it make sense to add a default argument for it if the user didn't provide it. I think this would abstract some complexity away from the user.

CC: @rmccorm4 for thoughts.

dyastremsky commented 5 months ago

Great work David! Only some minor tweaks and clarifications.

Just a thought, should we be setting a default value for the required arguments? For example, the profile subcommand requires concurrency as a parameter. During the arg sanitation process, would it make sense to add a default argument for it if the user didn't provide it. I think this would abstract some complexity away from the user.

CC: @rmccorm4 for thoughts.

I think we're 100% going for a passthrough approach here. This makes the Triton CLI extensible and maintains all logic unique to the tools in their own repositories. If we start moving over logic, then responsibilities are starting to overlap and we could be duplicating code. If there is a required arg for profile and we think it should set a default, Model Analyzer is the right place to fix that.

dyastremsky commented 5 months ago

This work is currently on hold. I will comment the ticket once that status changes.

dyastremsky commented 5 months ago

@rmccorm4 @fpetrini15 This should now be ready, but it is failing the Python 3.8 test due to types not yet been subscriptable until 3.9. While I could go into GenAI-Perf and make it Python 3.8-friendly, given that's not a requirement of GenAI-Perf, something will likely break again in the future. If we want this to pin the 24.04 version, this will also delay release until there is a release branch with the changes in GenAI-Perf.

dyastremsky commented 5 months ago

@rmccorm4 @fpetrini15 This should now be ready, but it is failing the Python 3.8 test due to types not yet been subscriptable until 3.9. While I could go into GenAI-Perf and make it Python 3.8-friendly, given that's not a requirement of GenAI-Perf, something will likely break again in the future. If we want this to pin the 24.04 version, this will also delay release until there is a release branch with the changes in GenAI-Perf.

I tried creating a branch with types that work with Python 3.8 (here), but then it's running into issues with not having the built-in library itertools available in 3.10. I'd need to go back and retrofit code. Is Python 3.8 support a requirement for Triton CLI? It seems for this integration, we may need to either make it a requirement for GenAI-Perf or not a requirement for either.

CC: @debermudez @nnshah1

rmccorm4 commented 5 months ago

I don't think it's a strong requirement at this time, probably more of a "Nice to have". I added it to CLI because it was easy when starting from scratch and I believe PyTriton supports 3.8+

I don't mind removing the support for now if it's not a simple fix to support or just unwanted.

dyastremsky commented 5 months ago

I don't think it's a strong requirement at this time. I added it to CLI because it was easy and I believe PyTriton supported 3.8+

I don't mind removing the support for now if it's not a simple fix to support or just unwanted.

If you're okay with it, I pushed a commit dropping it. I understand the desire for greater support though, so if we wanted to support 3.8-3.9, it would just require updating GenAI-Perf to remove the 3.10+ features and then move testing to 3.8. I'll start a conversation about it in the morning.

dyastremsky commented 5 months ago

As per the offline discussion, I have removed the 3.8 test for now.

@rmccorm4 @fpetrini15 This is ready for another round of review.