Design for serving models with different backends

alimaredia commented 5 months ago

This commit introduces a new design document for the [ilab model serve] command, detailing its functionality to serve models using different backends, specifically llama-cpp and vllm. It outlines the command structure, including backend-specific flags and arguments, and proposes a testing strategy for new engine integrations.

Co-authored-by: Sébastien Han seb@redhat.com Signed-off-by: Ali Maredia amaredia@redhat.com

alimaredia commented 4 months ago

@leseb @russellb @n1hility @mairin Thanks for moving this forward., I think --backend-args is very simple and a way forward.

One question I have is do we need to be able to provide the ability to turn all of the knobs in the CLI that we could in a config file? AKA does every parameter need the ability to be set in the config and on the CLI?

I'm guessing the answer is yes, but if that's not that case why don't we just have:

Backend agnostic flags: ex --model
A flag to specify the backend --backend
A flag to specify the config file --config. We have docs for all of the config file options for each backend, and populate all values in the serve section via the config we generate with ilab config init.

At least with this approach there would be one source of truth, the implementation would be clean and simple, and we would not be passing and parsing flags within flags. I've never seen another project with flags similar to my original --vllm-args flag, or as the all encompassing --backend-args flag. I know it's simple, but my initial reaction is that it seems strange to pass flags within flags.

mairin commented 4 months ago

I would say yes for ease of use if you want to override one parameter or whatnot in the config file.

alimaredia commented 4 months ago

The other thing I would say, is just passing the config file might make the world simpler if we we do default config setting with profiles like @cdoern has proposed. The user would only have to focus their attention on the config file. Charlie if you have any thoughts, please chime in.

leseb commented 4 months ago

@alimaredia @mairin @n1hility @russellb, I'd like to hold off on discussing --backend-args until we have more people present. However, it seems we all agree on introducing a new --backend option. Can I get a confirmation on this? All my progress is stuck based on this single decision, and I have several PRs to submit. I'd like to make headway before I go away next week.

Much appreciated.

n1hility commented 4 months ago

@alimaredia @mairin @n1hility @russellb, I'd like to hold off on discussing --backend-args until we have more people present. However, it seems we all agree on introducing a new --backend option. Can I get a confirmation on this? All my progress is stuck based on this single decision, and I have several PRs to submit. I'd like to make headway before I go away next week.

Much appreciated.

Sure but what we merge needs to reflect what we have consensus on. So to defer the backend args we need to revise the text from being a requirement to being informative.

mairin commented 4 months ago

I won't hold anything up but I don't agree with making --backend a flag. I have read back through comments here several times and do not see a rationale for this change. I think it is important to understand why we are going from subcommand to flag, as the subcommand approach was discussed extensively in brainstorming this devdoc and reviewed verbally with multiple folks - even rhtdan felt it was the way to go. I would feel better if I understood why we are going to a flag from a subcommand.

My core concern is from the user experience end. The user experience for this project is incredibly important and if we can have a delightful one it will be a differentiator. However, I am not considering other things like what limitations click might have.

I think having --backendargs as a catch all and starting with that for the implemetation is a great idea

Again, I will not hold this one up bc we need to complete this and have smtg working and that is the most important.

leseb commented 4 months ago

Since some of the comments are not visible anymore after a push, I'd like to summarize why using a ilab model serve --backend flag is more appropriate than a positional argument like ilab model serve <backend>:

Flexibility: Adding new backends or other options becomes more intuitive and consistent when using flags. It avoids the rigidity of positional arguments, making future expansions or modifications seamless. The flag --backend could be ported to other commands later (chat? generate) if we determine it makes sense.
Consistency: Following a common pattern in CLI design where options are specified with flags ensures uniformity. This approach is clearer and more extendable, aligning with widely accepted CLI design principles.
Clarity: Using --backend makes it immediately apparent that this is an option, enhancing the self-explanatory nature of the command. Users can quickly understand the command's structure and purpose. Also "Flags" signify optional parameters, aligning with the backend's nature as an auto-detected feature. Positional arguments typically indicate required inputs, which could confuse users about the backend's necessity. So the --backend flag help delineate optional from required parameters more effectively.
Behavioral Context: Optional parameters, such as --backend, alter the program's behavior without being mandatory. This distinction is better communicated through flags, enhancing user comprehension.

As a side note: I'd argue that the only parameter that should be positional is the model path, as it's essential.

I'd to reference the guide from CLIG.dev:

Prefer flags to args. It’s a bit more typing, but it makes it much clearer what is going on. It also makes it easier to make changes to how you accept input in the future. Sometimes when using args, it’s impossible to add new input without breaking existing behavior or creating ambiguity.

In conclusion, using --backend for the backend ensures flexibility, consistency, clarity, better integration, and aligns with best practices for CLI design.

Thanks!

@mairin I hope this will change your mind :). Are you keeping your request change to express your disagreement with the latest proposal? I'm seeking clarification because your last comment said you won't hold this up. I'm fine if you want to keep it and have someone else merge if this gets other approvals.

mairin commented 4 months ago

Sébastien and I had a call this morning and I am on board with this design now. The rationale had been squashed and he posted it above, and I also misunderstood part of how the flag was proposed to work. I am on board with this now. My main concern was making sure users do not try to use on set of arguments they learned to use with llamacpp and try to xfer them to a vllm or other backend and have a poor experience. Some things we discussed is capturing when that happens and erroring out gracefully, or letting the user know it won't work if we can detect that's going on (with an out to override and try it anyway), and also giving suggesting based on what the argument for backend A is with the equivalent argument for backend B.

instructlab / dev-docs

Design for serving models with different backends #81