Cannot stream from groq model

architjambhule commented 1 month ago

[x] This is actually a bug report.
[ ] I am not getting good LLM Results
[ ] I have tried asking for help in the community on discord or discussions and have not received a response.
[x] I have tried searching the documentation and have not found an answer.

What Model are you using?

[ ] gpt-3.5-turbo
[ ] gpt-4-turbo
[ ] gpt-4
[x] Other (llama 8b from Groq)

Describe the bug I cannot stream partial responses from Groq model even if the stream is set to true, it just gives out the final output. I can do the streaming by using GPT models.

To Reproduce Run the following code


text_block = """
In our recent online meeting, participants from various backgrounds joined to discuss the upcoming tech conference. The names and contact details of the participants were as follows:

- Name: John Doe, Email: johndoe@email.com, Twitter: @TechGuru44
- Name: Jane Smith, Email: janesmith@email.com, Twitter: @DigitalDiva88
- Name: Alex Johnson, Email: alexj@email.com, Twitter: @CodeMaster2023

During the meeting, we agreed on several key points. The conference will be held on March 15th, 2024, at the Grand Tech Arena located at 4521 Innovation Drive. Dr. Emily Johnson, a renowned AI researcher, will be our keynote speaker.

The budget for the event is set at $50,000, covering venue costs, speaker fees, and promotional activities. Each participant is expected to contribute an article to the conference blog by February 20th.

A follow-up meetingis scheduled for January 25th at 3 PM GMT to finalize the agenda and confirm the list of speakers.
"""

import instructor
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List
from groq import Groq

client = instructor.from_groq(Groq(api_key="api_key"), mode=instructor.Mode.TOOLS)

class User(BaseModel):
    name: str
    email: str
    twitter: str

class MeetingInfo(BaseModel):
    users: List[User]
    date: str
    location: str
    budget: int
    deadline: str

def extract_meeting_info(text_block):
    extraction_stream = client.chat.completions.create_partial(
        model="llama3-8b-8192",
        response_model=MeetingInfo,
        messages=[
            {
                "role": "user",
                "content": f"Get the information about the meeting and the users {text_block}",
            },
        ],
        stream=True
    )

    for extraction in extraction_stream:
        obj = extraction.model_dump()
        print(obj)

extract_meeting_info(text_block)

Expected behavior Stream partial responses as soon as they are generated, should not wait for the full response to be generated

vikyw89 commented 1 month ago

can you try using instructor.Mode.MD_JSON ?

architjambhule commented 1 month ago

Gives out this error

[ERROR] Pydantic LLM call failed: Mode be one of {instructor.Mode.JSON, instructor.Mode.TOOLS}

jxnl / instructor

Cannot stream from groq model #818