twinnydotdev / twinny

The most no-nonsense, locally or API-hosted AI code completion plugin for Visual Studio Code - like GitHub Copilot but completely free and 100% private.
https://twinny.dev
MIT License
2.93k stars 154 forks source link

Enhance Twinny with LiteLLM (and indirectly OpenRouter) Support #190

Closed bvelker closed 6 months ago

bvelker commented 6 months ago

Feature Request: Integrate LiteLLM with the Twinny Visual Studio Code plugin to enrich AI code completion capabilities. This integration aims to leverage the wide variety of models and APIs available through LiteLLM, enhancing the plugin's functionality and user experience.

Current Issue: Attempting to use LiteLLM Proxy results in an APIConnectionError because of an unexpected 'prompt' keyword argument, indicating a misalignment between the expected input by LiteLLM and the request data sent from Twinny.

Solution: Modify Twinny to accommodate the input structure required by LiteLLM Proxy, specifically by addressing how the 'prompt' key is handled.

Objective: The primary goal is to augment Twinny’s AI code completion service by enabling efficient access to an expanded set of models.

Action: Request implementation of this compatibility feature and encourage contributions towards integrating LiteLLM (and indirectly OpenRouter) support into Twinny. This initiative could significantly amplify the plugin's utility and adoption.

rjmacarthy commented 6 months ago

Hey, thanks for the report. I added another provider other and removed prompt from payload. Please report back.

Many thanks,

bvelker commented 6 months ago

it's closer We appear to have another request formatting issue for follow up chat responses

12:58:53 - LiteLLM Router:INFO: router.py:472 - litellm.acompletion(model=openrouter/openai/gpt-3.5-turbo-0125) 200 OK 12:58:53 - LiteLLM Router:DEBUG: router.py:1144 - Async Response: <litellm.utils.CustomStreamWrapper object at 0x119e19150> INFO: 127.0.0.1:51575 - "POST /chat/completions HTTP/1.1" 200 OK 12:58:53 - LiteLLM Proxy:DEBUG: proxy_server.py:2592 - inside generator {'error': {'message': '{\n "error": {\n "message": "Additional properties are not allowed (\'type\' was unexpected) - \'messages.1\'",\n "type": "invalid_request_error",\n "param": null,\n "code": null\n }\n}\n', 'code': 400}} Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 2595, in async_data_generator async for chunk in response: File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/utils.py", line 9816, in anext raise e File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/utils.py", line 9700, in anext async for chunk in self.completion_stream: File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/openai/_streaming.py", line 117, in aiter async for item in self._iterator: File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/openai/_streaming.py", line 138, in stream raise APIError( openai.APIError: An error occurred during streaming

the first chat response always works fine. This is just a problem with follow up chats.

to replicate: litellm --port 12121 --detailed_debug --debug -c ./proxy_server_config.yaml

proxy_server_config.yaml model_list:

litellm_settings: drop_params: True max_budget: 100 budget_duration: 30d num_retries: 0 request_timeout: 600 telemetry: False

general_settings: proxy_budget_rescheduler_min_time: 60 proxy_budget_rescheduler_max_time: 64 proxy_batch_write_at: 1

vscode twinny settings "twinny.apiProvider": "other" "twinny.chatApiPath": "/chat/completions" "twinny.chatApiPort": 12121 "twinny.chatModelName": "gpt-3.5-turbo"

should we update Twinny: Chat Model Name "Model identifier for chat completions. Applicable only for Ollama and Oobabooga API." text?

Twinny: Fim Template Format - add "other" as an option. Actually I suggest using LiteLLM instead as that will guide users better

LiteLLM is a prompt structure translation layer for arbitrary api services using OpenAI prompting format, so in theory if Twinny can structure llm calls in Openai Format, it'll get all LiteLLM supported api's for free

rjmacarthy commented 6 months ago

I think I fixed it in 3.9.3, please check. By the way, please give details of how to make FIM request using liteLLM for using OpenAI and Ollama codellama. Thanks.

Edit: Hey, I added support for LiteLLM in 3.10.0 please check and report back. I am unsure about FIM template and LiteLLM still.

Thanks.

rjmacarthy commented 6 months ago

LiteLLM is now supported.