Feature request: OPENAI_API_ENDPOINT or equivalent CLI parameter to enable FOSS local / self-hosted OenAI API servers like ollama

kardolus / chatgpt-cli

ChatGPT CLI is an advanced command-line interface for ChatGPT models via OpenAI and Azure, offering streaming, query mode, and history tracking for seamless, context-aware conversations. Ideal for both users and developers, it provides advanced configuration and easy setup options to ensure a tailored conversational experience with the GPT model.

MIT License

404 stars 28 forks source link

Feature request: OPENAI_API_ENDPOINT or equivalent CLI parameter to enable FOSS local / self-hosted OenAI API servers like ollama #29

Closed PieBru closed 4 months ago

PieBru commented 4 months ago

Looking at their github stars progression, FOSS servers are seeing a fast grow. Supporting them will broaden the usage of this tool. Thank you, Piero

kardolus commented 4 months ago

Hi PIero!

We currently support setting the endpoint through the URL plus completion path:

name: openai
api_key: ""
model: gpt-4-turbo-preview
max_tokens: 8192
role: You are a helpful assistant.
temperature: 1
top_p: 1
frequency_penalty: 0
presence_penalty: 0
thread: personal
omit_history: false
url: https://api.openai.com
completions_path: /v1/chat/completions
models_path: /v1/models
auth_header: Authorization
auth_token_prefix: 'Bearer '

These can be set either through ~/.chatgpt-cli/config.yaml or through environment variables (ie. OPENAI_URL).

Please let me know if FOSS is working for you! I'm curious.

PieBru commented 4 months ago

Good news, it works in Arch Linux with ollama-cuda and these very basic parameters into ~/.chatgpt-cli/config.yaml :

name: ollama
api_key: "sk-..."
model: mistral
max_tokens: 2048
role: You are a helpful assistant.
temperature: 1
top_p: 1
frequency_penalty: 0
presence_penalty: 0
thread: personal
omit_history: false
url: "http://localhost:11434"
completions_path: /v1/chat/completions
models_path: /v1/models
auth_header: Authorization
auth_token_prefix: 'Bearer '

I prudentially lowered max_tokens, there are opensource models supporting 128K and more. BTW, ollama now serves also an OpenAI API endpoint, and can self-.host any GGUF model in RAM+GPU, so there are plenty of choices out there.

Thank you, Piero

kardolus commented 4 months ago

That's great! Thanks for circling back. Happy to hear that it's working. I will look into ollama. Cheers.