Discussion: modelfusion

capdevc commented 12 months ago

Opening this issue to discuss potentially migrating all the provider API handling over to modelfusion.

https://github.com/lgrammel/modelfusion

@nvms We discussed a little over email, but maybe here is better.

Potential benefits as I see them

Wingman gets instant support for all the providers that modelfusion supports.
- OpenAI
- Azure OpenAI
- Anthropic
- Cohere
- Huggingface
- Ollama
- Llama.cpp
- OpenAI Compatible APIs (vLLM, oobabooga, LM Studio for example)
- Wingman doesn't have to handle changes to those APIs etc.
- Potential to use other tools provided by modelfusion at some point in the future.
- Token counter/API usage cost tracker
- Model functions for tool usage (web browsing, google searching, etc)
- Easier agents (ReAct etc)
- VectorDB/RAG tools (embeddding and search of codebase, or documentation websites, for example)
- Pace of modelfusion development is currently pretty fast. Lots of new features and fast support for new models.

Potential issues:

Adding a not totally lightweight dependency
Pace of modelfusion development is really fast, so things are a little unstable.

I think a POC implementation using modelfusion would be a good starting point, so I've begun to work on that. I'll try and get a draft PR up with at least some of the work done to get to feature parity (at least) with where Wingman is now. I'm still working through the current Wingman codebase to figure out how everything is put together.

One thing I think we'll need to change up is some of the Preset parameter handling . Right now I think there's an assumption that all API config is a url and an api_key. The Azure OpenAI API config, for example, has no url parameter. Instead, it takes a few other parameters (resourceName, deploymentId, apiVersion) that are used to construct different end points for different models. I think a better separation of "API config" vs "completion/chat parameters" would maybe make sense.

nvms commented 12 months ago

Thanks for putting this together.

I'd be more than happy to offload this responsibility to a more focused tool like the one proposed.

The benefits here outweigh the issues. Actually, I'm not too worried about either of the mentioned issues.

Adding a not totally lightweight dependency

Even though the unminified distribution of modelfusion (as of right now) is about 1.8mb, it's served locally once the extension is installed. Also, since Wingman retains context when it is hidden, this load only happens once.

Pace of modelfusion development is really fast, so things are a little unstable.

This is slightly more of an issue only because newer releases of modelfusion will need to be smoke tested, but we can always pin to a known-good version while this happens.

One thing I think we'll need to change up is some of the Preset parameter handling . Right now I think there's an assumption that all API config is a url and an api_key. The Azure OpenAI API config, for example, has no url parameter. Instead, it takes a few other parameters (resourceName, deploymentId, apiVersion) that are used to construct different end points for different models. I think a better separation of "API config" vs "completion/chat parameters" would maybe make sense.

I'd love to see some mockups of solutions to this, because admittedly I don't think I fully grasp the issue with the preset UI. I feel like it makes sense, but I'm also biased. I'm happy to review any mockups and potentially implement the changes therein, but I'm lacking any sort of vision for improvements here. It sounds like maybe @synw has some ideas for this?

capdevc commented 12 months ago

I think maybe you're right. I had been going on the assumption that the set of recognized parameters would depend on both the API provider and the model selected. I don't think that's actually the case. After looking around a bit it seems that each API provider has a fixed set of parameters it can accept, even if it offers multiple different models. It's not as dynamic as I thought.

Ollama: https://github.com/jmorganca/ollama/blob/00d06619a11356a155362013b8fc0bc9d0d8a146/docs/modelfile.md#valid-parameters-and-values llama.cpp: https://github.com/ggerganov/llama.cpp/blob/5aa365d88fdb8fdd430ef3fc141c7a5fd37c3502/examples/server/README.md#api-endpoints

Maybe it makes sense to have "Azure GPT-4 (creative)" and "Azure GPT-4 (precise)" presets that are a total combination of the API selection and all the generation parameters. I'm not sure... flattening it like that does mean that you might end up with a large number of presets though.

Would love a third opinion!

nvms / wingman

Discussion: modelfusion #30