lobehub / lobe-chat

🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Azure / DeepSeek), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Vision/TTS) and plugin system. One-click FREE deployment of your private ChatGPT/ Claude application.
https://chat-preview.lobehub.com
Other
40.66k stars 9.25k forks source link

[Bug] Chat with ollama/mistral-7b behind litellm returns strange answer #1629

Closed francesco086 closed 4 months ago

francesco086 commented 5 months ago

💻 Operating System

Other

📦 Environment

Docker

🌐 Browser

Other

🐛 Bug Description

I serve the mistral-7b model using ollama, and set a litellm proxy in front of it. I am, for example, able to run the command:

curl --location 'https://.../chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer MYTOKEN' \
    --data ' {
    "model": "mistral-7b",
    "messages": [
        {
        "role": "user",
        "content": "what is the capital of Italy"
        }
    ]
    }'

and get the expected response

{"id":"chatcmpl-783d8aca-0113-42d1-ac9d-0a287706b49f","choices":[{"finish_reason":"stop","index":0,"message":{"content":" The capital city of Italy is Rome (Roma in Italian). Rome has been the political and cultural heart of Italy for over two thousand years, dating back to ancient Roman times. It is located in the central part of the country and is known for its rich history, beautiful architecture, art and vibrant culture.","role":"assistant"}}],"created":1710779810,"model":"ollama/mistral","object":"chat.completion","system_fingerprint":null,"usage":{"prompt_tokens":16,"completion_tokens":64,"total_tokens":80}}

I setup lobechat to use several OpenAI models via litellm (gpt 3.5, 4, and dalle3), and everything works fine. However, with ollama/mistral-7b I get the following behaviour (I pressed the "Stop" button after a while because it was too slow):

Screenshot 2024-03-18 at 17 41 17

🚦 Expected Behavior

No response

📷 Recurrence Steps

No response

📝 Additional Information

Services are running on Kubernetes, setup via ArgoCD.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: genai-playground-deployment
spec:
  selector:
    matchLabels:
      app: genai-playground
  replicas: 2 # tells deployment to run 2 pods matching the template
  template:
    metadata:
      labels:
        app: genai-playground
    spec:
      containers:
        - name: genai-playground-container
          image: lobehub/lobe-chat:v0.139.1
          resources:
            requests:
              memory: "1024Mi"
            limits:
              memory: "2048Mi"
          ports:
            - containerPort: 3210
          env:
            - name: OPENAI_PROXY_URL
              value: https://genai.mlops.eon.de/v1
            - name: CUSTOM_MODELS
              value: -all,+gpt-3.5-turbo,+gpt-4-32k,+gpt-4-vision-preview,+gpt-4,+mistral-7b
---
apiVersion: v1
kind: Service
metadata:
  name: genai-playground-service
spec:
  selector:
    app: genai-playground
  ports:
    - protocol: TCP
      port: 3210  # Adjust the port if your application uses a different one
      targetPort: 3210  # Adjust the target port if your application uses a different one
  type: ClusterIP  # Change to NodePort or ClusterIP based on your needs
lobehubbot commented 5 months ago

👀 @francesco086

Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible. Please make sure you have given us as much context as possible.\ 非常感谢您提交 issue。我们会尽快调查此事,并尽快回复您。 请确保您已经提供了尽可能多的背景信息。

arvinxx commented 4 months ago

@francesco086 I think it might be the issue of parameter. Try to add a high value of frequency_penalty.

Refs: https://lobehub.com/docs/usage/agents/model