promptslab / Promptify

Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research
https://discord.gg/m88xfYMbK6
Apache License 2.0
3.11k stars 232 forks source link

runtime is extremely slow after new update #86

Open Marwen-Bhj opened 11 months ago

Marwen-Bhj commented 11 months ago

14/7/2023 : successfully runing

`examples = [list of dictionnaries here as examples] data = list_of_product_names

result = nlp_prompter.fit('ner.jinja', domain = 'ecommerce', text_input = f'{data}', labels = ["NAME", "WEIGHT","VOLUME","COUNT"], examples = examples)`

after the update of 17/7/2023 :

`examples = [list of dictionnaries here as examples]

data = list_of_product_names

model = OpenAI(api_key) # or HubModel() for Huggingface-based inference or 'Azure' etc prompter = Prompter('ner.jinja') # select a template or provide custom template pipe = Pipeline(prompter , model)

result = pipe.fit(domain = 'ecommerce', text_input = new_input, labels = ["NAME", "WEIGHT","VOLUME","COUNT"], examples = examples )`

the code block keeps running forever with no results, even with a short sentence. when I take off examples it runs faster but it doesn't not yeild the desired output format.

monk1337 commented 11 months ago

Hi, Can you share exact full code with sample examples?

Hadjerkhd commented 11 months ago

I'm having the same issue here, any solutions ? I'm not using examples, however Here is the code :

class ChatGPTModel:
    def __init__(self,API_KEY,  model="gpt-3.5-turbo") -> None:
        self.API_KEY = API_KEY
        self.model  = OpenAI(self.API_KEY, model=model) 
        self.prompter = None
        self.pipeline = None

    def do_ner(self, sentence,labels=None, domain=None, examples=[], description=None) -> List[dict]:

        self.prompter = Prompter(template="ner.jinja")
        self.pipeline = Pipeline(self.prompter, self.model, output_path="/tmp", max_completion_length=200,
                                 output_format='[{"T":"entity type", "E":"entity text", "start": "entity start index", "end":"entity end index"}, {"T":"entity type", "E":"entity text", "start": "entity start index", "end":"entity end index"},...]',
                                 cache_size=10)
        print("calling pipeline to extract entites")
        result       = self.pipeline.fit(
                                text_input  = sentence, 
                                domain      = domain,
                                labels      = labels,
                                examples = examples, 
                                description = description
                                )
        #list of format [{"E":"entity", "T":"type"},]
        result = eval(result[0]['text'])
        print("pipeline called; entities extracted ", len(result))
        formatted_list = []
        formatted_list = format_ner_result(result)

        return formatted_list
monk1337 commented 11 months ago

@Hadjerkhd Hi, your max_completion_length is too high. sorry for the confusion the parameter max_completion_length doesn't reflect the completion length of the model, it's a parameter for the parser module.

https://github.com/promptslab/Promptify/blob/a121b88c87b7b712552287a6252b2103a60ff90b/promptify/parser/parser.py#L165

use max_completion_length = 5 or 10 I'll change the parameter name, it's confusing.

Peji-moghimi commented 4 months ago

I also have the same problem, except even running the promptify_NER.ipynb example notebook!

For the sake of ease, here is the code snippet:

from promptify import Prompter,OpenAI, Pipeline

model  = OpenAI(openai.api_key)
prompter  = Prompter('ner.jinja')
pipe  = Pipeline(prompter , model)

# Example sentence for demonstration
sent = "The patient is a 93-year-old female with a medical history of chronic right hip pain, osteoporosis, hypertension, depression, and chronic atrial fibrillation admitted for evaluation and management of severe nausea and vomiting and urinary tract infection"

result = pipe.fit(sent, domain = 'medical', labels = None)

print(result)

Even for this short example, it runs forever. The interesting observation is that if I shorten this example to:

sent = "The patient is a 93-year-old female with a medical history of chronic right hip pain"

it only take a second to run, but, if I leave in even couple more words:

sent = "The patient is a 93-year-old female with a medical history of chronic right hip pain and depression."

again it runs forever!

Peji-moghimi commented 4 months ago

Weirdly it turns out, if the input is wrapped in triple quotes it runs just fine and very short span of time.