nvms / wingman

Your pair programming wingman. Supports OpenAI, Anthropic, or any LLM on your local inference server.
https://marketplace.visualstudio.com/items?itemName=nvms.ai-wingman
ISC License
64 stars 10 forks source link

Discussion: modelfusion #30

Open capdevc opened 11 months ago

capdevc commented 11 months ago

Opening this issue to discuss potentially migrating all the provider API handling over to modelfusion.

https://github.com/lgrammel/modelfusion

@nvms We discussed a little over email, but maybe here is better.

Potential benefits as I see them

Potential issues:

I think a POC implementation using modelfusion would be a good starting point, so I've begun to work on that. I'll try and get a draft PR up with at least some of the work done to get to feature parity (at least) with where Wingman is now. I'm still working through the current Wingman codebase to figure out how everything is put together.

One thing I think we'll need to change up is some of the Preset parameter handling . Right now I think there's an assumption that all API config is a url and an api_key. The Azure OpenAI API config, for example, has no url parameter. Instead, it takes a few other parameters (resourceName, deploymentId, apiVersion) that are used to construct different end points for different models. I think a better separation of "API config" vs "completion/chat parameters" would maybe make sense.

nvms commented 11 months ago

Thanks for putting this together.

I'd be more than happy to offload this responsibility to a more focused tool like the one proposed.

The benefits here outweigh the issues. Actually, I'm not too worried about either of the mentioned issues.

Adding a not totally lightweight dependency

Even though the unminified distribution of modelfusion (as of right now) is about 1.8mb, it's served locally once the extension is installed. Also, since Wingman retains context when it is hidden, this load only happens once.

Pace of modelfusion development is really fast, so things are a little unstable.

This is slightly more of an issue only because newer releases of modelfusion will need to be smoke tested, but we can always pin to a known-good version while this happens.

One thing I think we'll need to change up is some of the Preset parameter handling . Right now I think there's an assumption that all API config is a url and an api_key. The Azure OpenAI API config, for example, has no url parameter. Instead, it takes a few other parameters (resourceName, deploymentId, apiVersion) that are used to construct different end points for different models. I think a better separation of "API config" vs "completion/chat parameters" would maybe make sense.

I'd love to see some mockups of solutions to this, because admittedly I don't think I fully grasp the issue with the preset UI. I feel like it makes sense, but I'm also biased. I'm happy to review any mockups and potentially implement the changes therein, but I'm lacking any sort of vision for improvements here. It sounds like maybe @synw has some ideas for this?

capdevc commented 11 months ago

I think maybe you're right. I had been going on the assumption that the set of recognized parameters would depend on both the API provider and the model selected. I don't think that's actually the case. After looking around a bit it seems that each API provider has a fixed set of parameters it can accept, even if it offers multiple different models. It's not as dynamic as I thought.

Ollama: https://github.com/jmorganca/ollama/blob/00d06619a11356a155362013b8fc0bc9d0d8a146/docs/modelfile.md#valid-parameters-and-values llama.cpp: https://github.com/ggerganov/llama.cpp/blob/5aa365d88fdb8fdd430ef3fc141c7a5fd37c3502/examples/server/README.md#api-endpoints

Maybe it makes sense to have "Azure GPT-4 (creative)" and "Azure GPT-4 (precise)" presets that are a total combination of the API selection and all the generation parameters. I'm not sure... flattening it like that does mean that you might end up with a large number of presets though.

Would love a third opinion!