promptslab / Promptify

Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research
https://discord.gg/m88xfYMbK6
Apache License 2.0
3.21k stars 238 forks source link

Added a tabular data extractor from plain text example #11

Closed eren23 closed 1 year ago

eren23 commented 1 year ago

Below I share an example to input several plain text inputs to get output in tabular format;

Examples pairs;

Input: John Doe, a 32-year-old engineer, can be reached at johndoe@email.com. Output: [{'Tabular': '{'name': 'John Doe', 'age': 32, 'occupation': 'Engineer', 'email': 'johndoe@email.com'}' }]

Input: The latest iPhone, the XS Max, is priced at $999 and comes in a 128GB version with a gold finish. Output: [{'Tabular': '{'product': 'iPhone', 'model': 'XS Max', 'price': 999, 'storage': '128GB', 'color': 'Gold'}' }]

. . .

Query Sentence;

Input: Emily Davis, a 31-year-old lawyer, can be reached at emilyd@email.com. Output:

Results;

" [{'Tabular': '{'name': 'Emily Davis', 'age': 31, 'occupation': 'Lawyer', 'email': 'emilyd@email.com'}' }]"

eren23 commented 1 year ago

Tested with longer, and relatively more complicated text like;

Input: The latest Samsung Galaxy S21, with 5G capabilities, is a high-end smartphone that has received positive reviews from critics. It is priced at $799 and comes in a 128GB version in Phantom Black. The phone features a large display, fast processing speeds, and a long-lasting battery.

Expected Output: "Tabular Data": { "product": "Samsung Galaxy", "model": "S21", "price": 799, "storage": "128GB", "color": "Phantom Black" }

Output: 'Tabular': {'product': 'Samsung Galaxy S21', 'price': 799, 'storage': '128GB', 'color': 'Phantom Black'}

I don't know if I can generalize the input prompt so it can be generalizably more specific, maybe rather than prompt tweaking examples introduced to data can be also useful here. Any ideas or direct changes to this PR is welcome if you think is needed. @monk1337

monk1337 commented 1 year ago

Thank you for your contribution; it's interesting. Few suggestions before we merge this PR

1) Please change the location of the file to https://github.com/promptslab/Promptify/tree/main/promptify/prompts/tabular/ 2) Add colab notebook and readme.md reference link (use eval() to get JSON output)

eren23 commented 1 year ago

Thank you for your contribution; it's interesting. Few suggestions before we merge this PR

  1. Please change the location of the file to https://github.com/promptslab/Promptify/tree/main/promptify/prompts/tabular/
  2. Add colab notebook and readme.md reference link (use eval() to get JSON output)

Thanks a lot for the review, let me clarify before doing anything, I think we want to move the jinja file to tabular directory, but in that case wouldn't it require additional changes in the directory itself?

nlp_prompter.generate_prompt('tabular_extractor.jinja', ... this part for example would fail because it's referencing to nlp directory's generate_prompt method etc.

Of course that can be implemented too but what I understood about your initial tabular task was to creation of a pipeline that can extract information from a tabular source, since the task I worked on is more from text --> tabular I considered it as an NLP task and placed it there.

So I'm kind of lost about what to do, can you open that up a bit?

About the second suggestion, I can add the eval() function to the notebook for sure but can't really say that I understand the first part.

monk1337 commented 1 year ago

I think you are right. It makes more sense to keep this in the NLP module because it's text --> tabular 1) What will the output look like if examples are not given? Can you add a default output format so that it will work without examples? For example, we can add the prompt something like this:

You are a highly intelligent and accurate tabular data extractor from plain text input, your inputs can be text of arbitrary size, but the output should be in [{'tabular': {'entity_type': 'entity'} }] JSON format

You can make it better; it's just an example.

2) sure, use eval() to parse it easily

eren23 commented 1 year ago

I think you are right. It makes more sense to keep this in the NLP module because it's text --> tabular

  1. What will the output look like if examples are not given? Can you add a default output format so that it will work without examples? For example, we can add the prompt something like this:

You are a highly intelligent and accurate tabular data extractor from plain text input, your inputs can be text of arbitrary size, but the output should be in [{'tabular': {'entity_type': 'entity'} }] JSON format

You can make it better; it's just an example.

  1. sure, use eval() to parse it easily

Added both of them and pushed a new commit, also see the last cell to see how I avoided the error with eval(), can be related to https://github.com/promptslab/Promptify/issues/4 your issue here.

eren23 commented 1 year ago

@monk1337 Wanted to ping about my other commit from yesterday, if is good enough maybe we can merge it already before master goes further with PRs tagged as enhanced.

monk1337 commented 1 year ago

Thank you @eren23, for your great contribution; merging it now.