Separate providers for inline completion

michaelchia commented 6 months ago

Problem

I would like to propose having a separate set of providers for inline completion models, similar to the separation between embedding and llm models. In addition to just allowing users to use a different model for chat and inline completion, generally, the inline completion models are specialized models for inline completion such as starcoder, code-llama, code-gecko, or any of the models from https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard. They also typically have a different interface where they can take in an optional suffix either through a separate parameter or a specific prompt template. It can also be unsafe to assume that any LLM can reliably produce code suitable for inline completion with standard prompt templates and pre/post-processing.

Proposed Solution

Create a new base completion provider class where the handling of the InlineCompletionRequest to produce suggestions can be implemented for each model/provider as the prompt templates, pre/post-processing, and handling of suffix can differ for each provider.

Langchain doesn't seem to provide explicit support for these code completion models (unless I am just unaware), so it might not be possible to rely on langchain in the same way as for general LLMs and embeddings. For example, a model like Google's code-gecko takes in a separate input for suffix, while langchain LLMs only can take in a single input.

Additional context

I'll be willing to work on a PR for this if you'd like me to.

krassowski commented 6 months ago

Short-term I would suggest two steps: 1) make providers for chat and for inline completion separately configurable 2) allow to tag a provider as only suitable for completion but not for chat (so that it does not display in the selection list for chat) and vice versa

This is because many chat providers, including SOTA models, work reasonably well as completion providers too.

As for prompt templates, the completion and chat prompt templates are separate and configurable on per-provider basis, see:

https://github.com/jupyterlab/jupyter-ai/blob/e3cd019e5703e981001fe240686953fb47e04b7d/packages/jupyter-ai-magics/jupyter_ai_magics/providers.py#L317-L321

https://github.com/jupyterlab/jupyter-ai/blob/e3cd019e5703e981001fe240686953fb47e04b7d/packages/jupyter-ai-magics/jupyter_ai_magics/providers.py#L343-L347

Note that largely arbitrary suffix handling can be applied using the jinja-based prompt template.

Larger refactor will likely be desirable at some point but I would suggest that good rationale for performing such a refactor is presented first (e.g. what cannot be achieved or is problematic with the existing approaches) and the detailed plan agreed before starting the work.

michaelchia commented 6 months ago

Thanks for the reply. Could you suggest how I could use the code-gecko model from Google VertexAI with the current implementation? The SDK has a separate param for suffix (see https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/code-completion#code-completion-prompt-python_vertex_ai_sdk). I am not sure how to access that param via langchain.

krassowski commented 6 months ago

Well, a hacky but simple idea is that you create a dummy template which looks like:

{prefix}@@@{suffix}

where @@@ is some clever separator which has no chance occurring in real code (probably not @@@ but for sake of simplicity lets use it) and then in _call method of the custom LLM you do:

prefix, suffix = prompt.split('@@@')

and call the API/SDK using these two arguments.

michaelchia commented 6 months ago

Yea I was thinking something like that but was hoping I wouldn't need to do that. But sure, that's fine in the meantime. Thanks!

I would be looking forward to the two enhancements you suggested, that would help a lot in the short run.

jupyterlab / jupyter-ai