Content Extraction Issue

CYFARE commented 2 months ago

Hi Groq Team,

Issue

I'm trying to make API responses usable but so far have failed to extract the information inside "content=" part of responses properly. I'm using the following code:

I have tested the following for extraction but none of them sattify multi-line extraction:

    extracted_content = re.search(r'content="([\s\S]*?)" response_metadata', content).group(1)

    extracted_content = re.search(r'content=[\'"]([^\'"]*)[\'"]', content).group(1)
    clean_content = re.sub(r'\s+response_metadata=\{.*\}$', '', extracted_content, flags=re.DOTALL)

    extracted_content = re.search(r'content=[\'"]([^\'"]*)[\'"]', content).group(1)
    clean_content = re.sub(r'\s+response_metadata=\{.*\}$', '', extracted_content, flags=re.DOTALL)

    extracted_content = re.search(r'content="([^"]*)"', content).group(1)
    clean_content = re.sub(r'\s+response_metadata=\{.*\}$', '', extracted_content, flags=re.DOTALL)

I've also tried dumping this into json dict, but for different types of responses, edge-cases are making extraction inaccurate, for example, only half the content is extracted due to some "'" or "\" character inside answers and so on..

Sample Response

The following is my sample query & associated api response:

I want to only get the answer inside content= either directly through API response or if you have any python function/solution to extract this value under different edge-cases, for example, when the answer has multiple lines and multiple code blocks or when the answer has different encoding types.

Thanks in advance!

gradenr commented 2 months ago

Have you tried using with_structured_output to apply a schema to the content? https://python.langchain.com/docs/modules/model_io/chat/structured_output/#json-mode-2

This will allow you to specify a schema in which you want the model to respond.

CYFARE commented 2 months ago

Well, idk if something was changed from API backend, but now the answers are direct when using llama3 model :+1:

Thanks for the help!

groq / groq-python

Content Extraction Issue #29

Issue

Sample Response