lmstudio-ai / lmstudio-bug-tracker

Bug tracking for the LM Studio desktop application
10 stars 3 forks source link

Queued request always return empty response #204

Open observerw opened 1 week ago

observerw commented 1 week ago

Example

import asyncio

from langchain_openai import ChatOpenAI
from pydantic import BaseModel

class Schema(BaseModel):
    name: str

client = ChatOpenAI(
    base_url="http://localhost:7689/v1",
    temperature=0,
).with_structured_output(Schema, method="json_schema")

async def main():
    resp1, resp2 = client.invoke("My name is John"), client.invoke("My name is John")
    print("sync invoke success")

    resp1, resp2 = await asyncio.gather(
        client.ainvoke("My name is John"),
        client.ainvoke("My name is John"),
    )
    print("async invoke success")

if __name__ == "__main__":
    asyncio.run(main())

Cause following error:

sync invoke success
pydantic_core._pydantic_core.ValidationError: 1 validation error for Schema
  Invalid JSON: EOF while parsing a value at line 1 column 0 [type=json_invalid, input_value='', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/json_invalid

Log

[LM STUDIO SERVER] [llama-3.1-8b-instruct@q4_k_m] Generated prediction: {
// ...
  "usage": {
    "prompt_tokens": 39,
    "completion_tokens": 0,
    "total_tokens": 39
  },
}

The completion_tokens is always 0 for queued request.

Info

LMStudio Version: 3.5-beta Model: llama-3.1-8b-instruct@q4_k_m Hardware Info:

{
  "result": {
    "code": "Success",
    "message": ""
  },
  "cpuInfo": {
    "architecture": "ARM64",
    "supportedInstructionSetExtensions": [
      "AdvSIMD"
    ]
  }
}

GPU:

runtime: Metal llama.cpp v1.3.0

{
  "result": {
    "code": "Success",
    "message": "Successfully queried Apple Silicon for GPU Info."
  },
  "gpuInfo": [
    {
      "name": "Apple Silicon",
      "deviceId": -1,
      "totalMemoryCapacityBytes": 34359738368,
      "dedicatedMemoryCapacityBytes": 22906503168,
      "integrationType": "Integrated",
      "detectionPlatform": "Metal",
      "detectionPlatformVersion": "",
      "otherInfo": {}
    }
  ]
}

safetensors runtime: LM Studio MLX v0.0.14

{
  "result": {
    "code": "Success",
    "message": "Successfully queried Apple Silicon for GPU Info."
  },
  "gpuInfo": [
    {
      "name": "Apple Silicon",
      "deviceId": -1,
      "totalMemoryCapacityBytes": 34359738368,
      "dedicatedMemoryCapacityBytes": 22906503168,
      "integrationType": "Integrated",
      "detectionPlatform": "Metal",
      "detectionPlatformVersion": "",
      "otherInfo": {}
    }
  ]
}