[Bug]: OpenAI API emulation incompatible with shell_gpt OpenAI API calls

Your current environment

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04 LTS (x86_64)
GCC version: (Ubuntu 13.2.0-23ubuntu4) 13.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.39

Python version: 3.12.3 | packaged by Anaconda, Inc. | (main, May  6 2024, 19:46:43) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-35-generic-x86_64-with-glibc2.39
Is CUDA available: N/A
CUDA runtime version: 12.0.140
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: 
GPU 0: NVIDIA RTX 6000 Ada Generation
GPU 1: NVIDIA RTX 6000 Ada Generation

Nvidia driver version: 535.171.04
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.2
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               64
On-line CPU(s) list:                  0-63
Vendor ID:                            AuthenticAMD
Model name:                           AMD Ryzen Threadripper PRO 5975WX 32-Cores
CPU family:                           25
Model:                                8
Thread(s) per core:                   2
Core(s) per socket:                   32
Socket(s):                            1
Stepping:                             2
Frequency boost:                      enabled
CPU(s) scaling MHz:                   29%
CPU max MHz:                          7006.6401
CPU min MHz:                          1800.0000
BogoMIPS:                             7187.05
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
Virtualization:                       AMD-V
L1d cache:                            1 MiB (32 instances)
L1i cache:                            1 MiB (32 instances)
L2 cache:                             16 MiB (32 instances)
L3 cache:                             128 MiB (4 instances)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-63
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] No relevant packages
[conda] No relevant packages
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
[4mGPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID[0m
GPU0     X  PHB 0-63    0       N/A
GPU1    PHB  X  0-63    0       N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

Requests from shell_gpt work fine with OpenAI's API, but they don't work with vllm's emulation of the OpenAI API.

https://github.com/TheR1D/shell_gpt

I would expect OpenAI API emulation to work with common tools like shell_gpt

Script started on 2024-06-24 16:15:33-04:00 [TERM="dumb" TTY="/dev/pts/7" COLUMNS="118" LINES="56"]
sedna:~$ OPENAI_BASE_URL=https://api.openai.com/v1 sgpt --model=gpt-4o 'Capital of France?'
The capital of France is Paris.
sedna:~$ OPENAI_BASE_URL=http://triton:8000/v1 sgpt --model=Qwen/Qwen2-beta-7B-Chat 'Capital of France?'

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/tmb/.local/share/pipx/venvs/shell-gpt/lib/python3.12/site-packages/sgpt/app.py:229 in main │
│                                                                                                  │
│   226 │   │   │   functions=function_schemas,                                                    │
│   227 │   │   )                                                                                  │
│   228 │   else:                                                                                  │
│ ❱ 229 │   │   full_completion = DefaultHandler(role_class, md).handle(                           │
│   230 │   │   │   prompt=prompt,                                                                 │
│   231 │   │   │   model=model,                                                                   │
│   232 │   │   │   temperature=temperature,                                                       │
│                                                                                                  │
│ ╭─────────────────────────────── locals ────────────────────────────────╮                        │
│ │               cache = True                                            │                        │
│ │                chat = None                                            │                        │
│ │                code = False                                           │                        │
│ │         create_role = None                                            │                        │
│ │      describe_shell = False                                           │                        │
│ │              editor = False                                           │                        │
│ │    function_schemas = None                                            │                        │
│ │           functions = True                                            │                        │
│ │   install_functions = None                                            │                        │
│ │ install_integration = None                                            │                        │
│ │         interaction = True                                            │                        │
│ │          list_chats = None                                            │                        │
│ │          list_roles = None                                            │                        │
│ │                  md = True                                            │                        │
│ │               model = 'Qwen/Qwen2-beta-7B-Chat'                       │                        │
│ │              prompt = 'Capital of France?'                            │                        │
│ │                repl = None                                            │                        │
│ │                role = None                                            │                        │
│ │          role_class = <sgpt.role.SystemRole object at 0x7fead891ade0> │                        │
│ │               shell = False                                           │                        │
│ │           show_chat = None                                            │                        │
│ │           show_role = None                                            │                        │
│ │        stdin_passed = False                                           │                        │
│ │         temperature = 0.0                                             │                        │
│ │               top_p = 1.0                                             │                        │
│ │             version = None                                            │                        │
│ ╰───────────────────────────────────────────────────────────────────────╯                        │
│                                                                                                  │
│ /home/tmb/.local/share/pipx/venvs/shell-gpt/lib/python3.12/site-packages/sgpt/handlers/handler.p │
│ y:158 in handle                                                                                  │
│                                                                                                  │
│   155 │   │   │   caching=caching,                                                               │
│   156 │   │   │   **kwargs,                                                                      │
│   157 │   │   )                                                                                  │
│ ❱ 158 │   │   return self.printer(generator, not disable_stream)                                 │
│   159                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │        caching = True                                                                        │ │
│ │ disable_stream = False                                                                       │ │
│ │      functions = None                                                                        │ │
│ │      generator = <generator object Cache.__call__.<locals>.wrapper at 0x7fead8b45b40>        │ │
│ │         kwargs = {}                                                                          │ │
│ │       messages = [                                                                           │ │
│ │                  │   {                                                                       │ │
│ │                  │   │   'role': 'system',                                                   │ │
│ │                  │   │   'content': 'You are ShellGPT\nYou are programming and system        │ │
│ │                  administration assistant.\nYou ar'+281                                      │ │
│ │                  │   },                                                                      │ │
│ │                  │   {'role': 'user', 'content': 'Capital of France?'}                       │ │
│ │                  ]                                                                           │ │
│ │          model = 'Qwen/Qwen2-beta-7B-Chat'                                                   │ │
│ │         prompt = 'Capital of France?'                                                        │ │
│ │           self = <sgpt.handlers.default_handler.DefaultHandler object at 0x7fead8c0fef0>     │ │
│ │    temperature = 0.0                                                                         │ │
│ │          top_p = 1.0                                                                         │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/tmb/.local/share/pipx/venvs/shell-gpt/lib/python3.12/site-packages/sgpt/printer.py:23 in   │
│ __call__                                                                                         │
│                                                                                                  │
│   20 │                                                                                           │
│   21 │   def __call__(self, chunks: Generator[str, None, None], live: bool = True) -> str:       │
│   22 │   │   if live:                                                                            │
│ ❱ 23 │   │   │   return self.live_print(chunks)                                                  │
│   24 │   │   with self.console.status("[bold green]Loading..."):                                 │
│   25 │   │   │   full_completion = "".join(chunks)                                               │
│   26 │   │   self.static_print(full_completion)                                                  │
│                                                                                                  │
│ ╭─────────────────────────────────── locals ────────────────────────────────────╮                │
│ │ chunks = <generator object Cache.__call__.<locals>.wrapper at 0x7fead8b45b40> │                │
│ │   live = True                                                                 │                │
│ │   self = <sgpt.printer.MarkdownPrinter object at 0x7fead8919940>              │                │
│ ╰───────────────────────────────────────────────────────────────────────────────╯                │
│                                                                                                  │
│ /home/tmb/.local/share/pipx/venvs/shell-gpt/lib/python3.12/site-packages/sgpt/printer.py:38 in   │
│ live_print                                                                                       │
│                                                                                                  │
│   35 │   def live_print(self, chunks: Generator[str, None, None]) -> str:                        │
│   36 │   │   full_completion = ""                                                                │
│   37 │   │   with Live(console=self.console) as live:                                            │
│ ❱ 38 │   │   │   for chunk in chunks:                                                            │
│   39 │   │   │   │   full_completion += chunk                                                    │
│   40 │   │   │   │   markdown = Markdown(markup=full_completion, code_theme=self.theme)          │
│   41 │   │   │   │   live.update(markdown, refresh=True)                                         │
│                                                                                                  │
│ ╭──────────────────────────────────────── locals ────────────────────────────────────────╮       │
│ │          chunks = <generator object Cache.__call__.<locals>.wrapper at 0x7fead8b45b40> │       │
│ │ full_completion = ''                                                                   │       │
│ │            live = <rich.live.Live object at 0x7fead891ac60>                            │       │
│ │            self = <sgpt.printer.MarkdownPrinter object at 0x7fead8919940>              │       │
│ ╰────────────────────────────────────────────────────────────────────────────────────────╯       │
│                                                                                                  │
│ /home/tmb/.local/share/pipx/venvs/shell-gpt/lib/python3.12/site-packages/sgpt/cache.py:37 in     │
│ wrapper                                                                                          │
│                                                                                                  │
│   34 │   │   │   │   yield file.read_text()                                                      │
│   35 │   │   │   │   return                                                                      │
│   36 │   │   │   result = ""                                                                     │
│ ❱ 37 │   │   │   for i in func(*args, **kwargs):                                                 │
│   38 │   │   │   │   result += i                                                                 │
│   39 │   │   │   │   yield i                                                                     │
│   40 │   │   │   if "@FunctionCall" not in result:                                               │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │   args = (<sgpt.handlers.default_handler.DefaultHandler object at 0x7fead8c0fef0>,)          │ │
│ │   file = PosixPath('/tmp/cache/9b091832b44135f3a9c6d16f9555d2b7')                            │ │
│ │   func = <function Handler.get_completion at 0x7fead8902200>                                 │ │
│ │    key = '9b091832b44135f3a9c6d16f9555d2b7'                                                  │ │
│ │ kwargs = {                                                                                   │ │
│ │          │   'model': 'Qwen/Qwen2-beta-7B-Chat',                                             │ │
│ │          │   'temperature': 0.0,                                                             │ │
│ │          │   'top_p': 1.0,                                                                   │ │
│ │          │   'messages': [                                                                   │ │
│ │          │   │   {                                                                           │ │
│ │          │   │   │   'role': 'system',                                                       │ │
│ │          │   │   │   'content': 'You are ShellGPT\nYou are programming and system            │ │
│ │          administration assistant.\nYou ar'+281                                              │ │
│ │          │   │   },                                                                          │ │
│ │          │   │   {'role': 'user', 'content': 'Capital of France?'}                           │ │
│ │          │   ],                                                                              │ │
│ │          │   'functions': None                                                               │ │
│ │          }                                                                                   │ │
│ │ result = ''                                                                                  │ │
│ │   self = <sgpt.cache.Cache object at 0x7fead8918440>                                         │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/tmb/.local/share/pipx/venvs/shell-gpt/lib/python3.12/site-packages/sgpt/handlers/handler.p │
│ y:99 in get_completion                                                                           │
│                                                                                                  │
│    96 │   │   if is_shell_role or is_code_role or is_dsc_shell_role:                             │
│    97 │   │   │   functions = None                                                               │
│    98 │   │                                                                                      │
│ ❱  99 │   │   response = completion(                                                             │
│   100 │   │   │   model=model,                                                                   │
│   101 │   │   │   temperature=temperature,                                                       │
│   102 │   │   │   top_p=top_p,                                                                   │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │         arguments = ''                                                                       │ │
│ │         functions = None                                                                     │ │
│ │      is_code_role = False                                                                    │ │
│ │ is_dsc_shell_role = False                                                                    │ │
│ │     is_shell_role = False                                                                    │ │
│ │          messages = [                                                                        │ │
│ │                     │   {                                                                    │ │
│ │                     │   │   'role': 'system',                                                │ │
│ │                     │   │   'content': 'You are ShellGPT\nYou are programming and system     │ │
│ │                     administration assistant.\nYou ar'+281                                   │ │
│ │                     │   },                                                                   │ │
│ │                     │   {'role': 'user', 'content': 'Capital of France?'}                    │ │
│ │                     ]                                                                        │ │
│ │             model = 'Qwen/Qwen2-beta-7B-Chat'                                                │ │
│ │              name = ''                                                                       │ │
│ │              self = <sgpt.handlers.default_handler.DefaultHandler object at 0x7fead8c0fef0>  │ │
│ │       temperature = 0.0                                                                      │ │
│ │             top_p = 1.0                                                                      │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/tmb/.local/share/pipx/venvs/shell-gpt/lib/python3.12/site-packages/openai/_utils/_utils.py │
│ :277 in wrapper                                                                                  │
│                                                                                                  │
│   274 │   │   │   │   │   else:                                                                  │
│   275 │   │   │   │   │   │   msg = f"Missing required argument: {quote(missing[0])}"            │
│   276 │   │   │   │   raise TypeError(msg)                                                       │
│ ❱ 277 │   │   │   return func(*args, **kwargs)                                                   │
│   278 │   │                                                                                      │
│   279 │   │   return wrapper  # type: ignore                                                     │
│   280                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │            _ = <openai.resources.chat.completions.Completions object at 0x7fead8b94890>      │ │
│ │         args = (<openai.resources.chat.completions.Completions object at 0x7fead8b94890>,)   │ │
│ │         func = <function Completions.create at 0x7fead887d120>                               │ │
│ │ given_params = {'model', 'functions', 'self', 'top_p', 'messages', 'stream', 'temperature'}  │ │
│ │            i = 0                                                                             │ │
│ │          key = 'stream'                                                                      │ │
│ │       kwargs = {                                                                             │ │
│ │                │   'model': 'Qwen/Qwen2-beta-7B-Chat',                                       │ │
│ │                │   'temperature': 0.0,                                                       │ │
│ │                │   'top_p': 1.0,                                                             │ │
│ │                │   'messages': [                                                             │ │
│ │                │   │   {                                                                     │ │
│ │                │   │   │   'role': 'system',                                                 │ │
│ │                │   │   │   'content': 'You are ShellGPT\nYou are programming and system      │ │
│ │                administration assistant.\nYou ar'+281                                        │ │
│ │                │   │   },                                                                    │ │
│ │                │   │   {'role': 'user', 'content': 'Capital of France?'}                     │ │
│ │                │   ],                                                                        │ │
│ │                │   'functions': None,                                                        │ │
│ │                │   'stream': True                                                            │ │
│ │                }                                                                             │ │
│ │      matches = True                                                                          │ │
│ │   positional = ['self']                                                                      │ │
│ │      variant = ['messages', 'model']                                                         │ │
│ │     variants = (['messages', 'model'], ['messages', 'model', 'stream'])                      │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/tmb/.local/share/pipx/venvs/shell-gpt/lib/python3.12/site-packages/openai/resources/chat/c │
│ ompletions.py:606 in create                                                                      │
│                                                                                                  │
│    603 │   │   extra_body: Body | None = None,                                                   │
│    604 │   │   timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,                     │
│    605 │   ) -> ChatCompletion | Stream[ChatCompletionChunk]:                                    │
│ ❱  606 │   │   return self._post(                                                                │
│    607 │   │   │   "/chat/completions",                                                          │
│    608 │   │   │   body=maybe_transform(                                                         │
│    609 │   │   │   │   {                                                                         │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │          extra_body = None                                                                   │ │
│ │       extra_headers = None                                                                   │ │
│ │         extra_query = None                                                                   │ │
│ │   frequency_penalty = NOT_GIVEN                                                              │ │
│ │       function_call = NOT_GIVEN                                                              │ │
│ │           functions = None                                                                   │ │
│ │          logit_bias = NOT_GIVEN                                                              │ │
│ │            logprobs = NOT_GIVEN                                                              │ │
│ │          max_tokens = NOT_GIVEN                                                              │ │
│ │            messages = [                                                                      │ │
│ │                       │   {                                                                  │ │
│ │                       │   │   'role': 'system',                                              │ │
│ │                       │   │   'content': 'You are ShellGPT\nYou are programming and system   │ │
│ │                       administration assistant.\nYou ar'+281                                 │ │
│ │                       │   },                                                                 │ │
│ │                       │   {'role': 'user', 'content': 'Capital of France?'}                  │ │
│ │                       ]                                                                      │ │
│ │               model = 'Qwen/Qwen2-beta-7B-Chat'                                              │ │
│ │                   n = NOT_GIVEN                                                              │ │
│ │ parallel_tool_calls = NOT_GIVEN                                                              │ │
│ │    presence_penalty = NOT_GIVEN                                                              │ │
│ │     response_format = NOT_GIVEN                                                              │ │
│ │                seed = NOT_GIVEN                                                              │ │
│ │                self = <openai.resources.chat.completions.Completions object at               │ │
│ │                       0x7fead8b94890>                                                        │ │
│ │                stop = NOT_GIVEN                                                              │ │
│ │              stream = True                                                                   │ │
│ │      stream_options = NOT_GIVEN                                                              │ │
│ │         temperature = 0.0                                                                    │ │
│ │             timeout = NOT_GIVEN                                                              │ │
│ │         tool_choice = NOT_GIVEN                                                              │ │
│ │               tools = NOT_GIVEN                                                              │ │
│ │        top_logprobs = NOT_GIVEN                                                              │ │
│ │               top_p = 1.0                                                                    │ │
│ │                user = NOT_GIVEN                                                              │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/tmb/.local/share/pipx/venvs/shell-gpt/lib/python3.12/site-packages/openai/_base_client.py: │
│ 1240 in post                                                                                     │
│                                                                                                  │
│   1237 │   │   opts = FinalRequestOptions.construct(                                             │
│   1238 │   │   │   method="post", url=path, json_data=body, files=to_httpx_files(files), **opti  │
│   1239 │   │   )                                                                                 │
│ ❱ 1240 │   │   return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=str  │
│   1241 │                                                                                         │
│   1242 │   def patch(                                                                            │
│   1243 │   │   self,                                                                             │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │       body = {                                                                               │ │
│ │              │   'messages': [                                                               │ │
│ │              │   │   {                                                                       │ │
│ │              │   │   │   'role': 'system',                                                   │ │
│ │              │   │   │   'content': 'You are ShellGPT\nYou are programming and system        │ │
│ │              administration assistant.\nYou ar'+281                                          │ │
│ │              │   │   },                                                                      │ │
│ │              │   │   {'role': 'user', 'content': 'Capital of France?'}                       │ │
│ │              │   ],                                                                          │ │
│ │              │   'model': 'Qwen/Qwen2-beta-7B-Chat',                                         │ │
│ │              │   'frequency_penalty': NOT_GIVEN,                                             │ │
│ │              │   'function_call': NOT_GIVEN,                                                 │ │
│ │              │   'functions': None,                                                          │ │
│ │              │   'logit_bias': NOT_GIVEN,                                                    │ │
│ │              │   'logprobs': NOT_GIVEN,                                                      │ │
│ │              │   'max_tokens': NOT_GIVEN,                                                    │ │
│ │              │   'n': NOT_GIVEN,                                                             │ │
│ │              │   'parallel_tool_calls': NOT_GIVEN,                                           │ │
│ │              │   ... +12                                                                     │ │
│ │              }                                                                               │ │
│ │    cast_to = <class 'openai.types.chat.chat_completion.ChatCompletion'>                      │ │
│ │      files = None                                                                            │ │
│ │    options = {}                                                                              │ │
│ │       opts = FinalRequestOptions(                                                            │ │
│ │              │   method='post',                                                              │ │
│ │              │   url='/chat/completions',                                                    │ │
│ │              │   params={},                                                                  │ │
│ │              │   headers=NOT_GIVEN,                                                          │ │
│ │              │   max_retries=NOT_GIVEN,                                                      │ │
│ │              │   timeout=NOT_GIVEN,                                                          │ │
│ │              │   files=None,                                                                 │ │
│ │              │   idempotency_key=None,                                                       │ │
│ │              │   post_parser=NOT_GIVEN,                                                      │ │
│ │              │   json_data={                                                                 │ │
│ │              │   │   'messages': [                                                           │ │
│ │              │   │   │   {                                                                   │ │
│ │              │   │   │   │   'role': 'system',                                               │ │
│ │              │   │   │   │   'content': 'You are ShellGPT\nYou are programming and system    │ │
│ │              administration assistant.\nYou ar'+281                                          │ │
│ │              │   │   │   },                                                                  │ │
│ │              │   │   │   {'role': 'user', 'content': 'Capital of France?'}                   │ │
│ │              │   │   ],                                                                      │ │
│ │              │   │   'model': 'Qwen/Qwen2-beta-7B-Chat',                                     │ │
│ │              │   │   'functions': None,                                                      │ │
│ │              │   │   'stream': True,                                                         │ │
│ │              │   │   'temperature': 0.0,                                                     │ │
│ │              │   │   'top_p': 1.0                                                            │ │
│ │              │   },                                                                          │ │
│ │              │   extra_json=None                                                             │ │
│ │              )                                                                               │ │
│ │       path = '/chat/completions'                                                             │ │
│ │       self = <openai.OpenAI object at 0x7fead8a8a720>                                        │ │
│ │     stream = True                                                                            │ │
│ │ stream_cls = openai.Stream[openai.types.chat.chat_completion_chunk.ChatCompletionChunk]      │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/tmb/.local/share/pipx/venvs/shell-gpt/lib/python3.12/site-packages/openai/_base_client.py: │
│ 921 in request                                                                                   │
│                                                                                                  │
│    918 │   │   stream: bool = False,                                                             │
│    919 │   │   stream_cls: type[_StreamT] | None = None,                                         │
│    920 │   ) -> ResponseT | _StreamT:                                                            │
│ ❱  921 │   │   return self._request(                                                             │
│    922 │   │   │   cast_to=cast_to,                                                              │
│    923 │   │   │   options=options,                                                              │
│    924 │   │   │   stream=stream,                                                                │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │           cast_to = <class 'openai.types.chat.chat_completion.ChatCompletion'>               │ │
│ │           options = FinalRequestOptions(                                                     │ │
│ │                     │   method='post',                                                       │ │
│ │                     │   url='/chat/completions',                                             │ │
│ │                     │   params={},                                                           │ │
│ │                     │   headers=NOT_GIVEN,                                                   │ │
│ │                     │   max_retries=NOT_GIVEN,                                               │ │
│ │                     │   timeout=NOT_GIVEN,                                                   │ │
│ │                     │   files=None,                                                          │ │
│ │                     │   idempotency_key=None,                                                │ │
│ │                     │   post_parser=NOT_GIVEN,                                               │ │
│ │                     │   json_data={                                                          │ │
│ │                     │   │   'messages': [                                                    │ │
│ │                     │   │   │   {                                                            │ │
│ │                     │   │   │   │   'role': 'system',                                        │ │
│ │                     │   │   │   │   'content': 'You are ShellGPT\nYou are programming and    │ │
│ │                     system administration assistant.\nYou ar'+281                            │ │
│ │                     │   │   │   },                                                           │ │
│ │                     │   │   │   {'role': 'user', 'content': 'Capital of France?'}            │ │
│ │                     │   │   ],                                                               │ │
│ │                     │   │   'model': 'Qwen/Qwen2-beta-7B-Chat',                              │ │
│ │                     │   │   'functions': None,                                               │ │
│ │                     │   │   'stream': True,                                                  │ │
│ │                     │   │   'temperature': 0.0,                                              │ │
│ │                     │   │   'top_p': 1.0                                                     │ │
│ │                     │   },                                                                   │ │
│ │                     │   extra_json=None                                                      │ │
│ │                     )                                                                        │ │
│ │ remaining_retries = None                                                                     │ │
│ │              self = <openai.OpenAI object at 0x7fead8a8a720>                                 │ │
│ │            stream = True                                                                     │ │
│ │        stream_cls = openai.Stream[openai.types.chat.chat_completion_chunk.ChatCompletionChu… │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/tmb/.local/share/pipx/venvs/shell-gpt/lib/python3.12/site-packages/openai/_base_client.py: │
│ 1020 in _request                                                                                 │
│                                                                                                  │
│   1017 │   │   │   │   err.response.read()                                                       │
│   1018 │   │   │                                                                                 │
│   1019 │   │   │   log.debug("Re-raising status error")                                          │
│ ❱ 1020 │   │   │   raise self._make_status_error_from_response(err.response) from None           │
│   1021 │   │                                                                                     │
│   1022 │   │   return self._process_response(                                                    │
│   1023 │   │   │   cast_to=cast_to,                                                              │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │           cast_to = <class 'openai.types.chat.chat_completion.ChatCompletion'>               │ │
│ │            kwargs = {}                                                                       │ │
│ │           options = FinalRequestOptions(                                                     │ │
│ │                     │   method='post',                                                       │ │
│ │                     │   url='/chat/completions',                                             │ │
│ │                     │   params={},                                                           │ │
│ │                     │   headers=NOT_GIVEN,                                                   │ │
│ │                     │   max_retries=NOT_GIVEN,                                               │ │
│ │                     │   timeout=NOT_GIVEN,                                                   │ │
│ │                     │   files=None,                                                          │ │
│ │                     │   idempotency_key=None,                                                │ │
│ │                     │   post_parser=NOT_GIVEN,                                               │ │
│ │                     │   json_data={                                                          │ │
│ │                     │   │   'messages': [                                                    │ │
│ │                     │   │   │   {                                                            │ │
│ │                     │   │   │   │   'role': 'system',                                        │ │
│ │                     │   │   │   │   'content': 'You are ShellGPT\nYou are programming and    │ │
│ │                     system administration assistant.\nYou ar'+281                            │ │
│ │                     │   │   │   },                                                           │ │
│ │                     │   │   │   {'role': 'user', 'content': 'Capital of France?'}            │ │
│ │                     │   │   ],                                                               │ │
│ │                     │   │   'model': 'Qwen/Qwen2-beta-7B-Chat',                              │ │
│ │                     │   │   'functions': None,                                               │ │
│ │                     │   │   'stream': True,                                                  │ │
│ │                     │   │   'temperature': 0.0,                                              │ │
│ │                     │   │   'top_p': 1.0                                                     │ │
│ │                     │   },                                                                   │ │
│ │                     │   extra_json=None                                                      │ │
│ │                     )                                                                        │ │
│ │ remaining_retries = None                                                                     │ │
│ │           request = <Request('POST', 'http://triton:8000/v1/chat/completions')>              │ │
│ │          response = <Response [400 Bad Request]>                                             │ │
│ │           retries = 2                                                                        │ │
│ │              self = <openai.OpenAI object at 0x7fead8a8a720>                                 │ │
│ │            stream = True                                                                     │ │
│ │        stream_cls = openai.Stream[openai.types.chat.chat_completion_chunk.ChatCompletionChu… │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
BadRequestError: Error code: 400 - {'object': 'error', 'message': "[{'type': 'extra_forbidden', 'loc': ('body', 
'functions'), 'msg': 'Extra inputs are not permitted', 'input': None}]", 'type': 'BadRequestError', 'param': None, 
'code': 400}
sedna:~$ exit

Script done on 2024-06-24 16:15:42-04:00 [COMMAND_EXIT_CODE="1"]

vllm-project / vllm

[Bug]: OpenAI API emulation incompatible with shell_gpt OpenAI API calls #5797

Your current environment

🐛 Describe the bug