Closed eren23 closed 1 year ago
Tested with longer, and relatively more complicated text like;
Input: The latest Samsung Galaxy S21, with 5G capabilities, is a high-end smartphone that has received positive reviews from critics. It is priced at $799 and comes in a 128GB version in Phantom Black. The phone features a large display, fast processing speeds, and a long-lasting battery.
Expected Output: "Tabular Data": { "product": "Samsung Galaxy", "model": "S21", "price": 799, "storage": "128GB", "color": "Phantom Black" }
Output: 'Tabular': {'product': 'Samsung Galaxy S21', 'price': 799, 'storage': '128GB', 'color': 'Phantom Black'}
I don't know if I can generalize the input prompt so it can be generalizably more specific, maybe rather than prompt tweaking examples introduced to data can be also useful here. Any ideas or direct changes to this PR is welcome if you think is needed. @monk1337
Thank you for your contribution; it's interesting. Few suggestions before we merge this PR
1) Please change the location of the file to https://github.com/promptslab/Promptify/tree/main/promptify/prompts/tabular/
2) Add colab notebook and readme.md reference link (use eval()
to get JSON output)
Thank you for your contribution; it's interesting. Few suggestions before we merge this PR
- Please change the location of the file to
https://github.com/promptslab/Promptify/tree/main/promptify/prompts/tabular/
- Add colab notebook and readme.md reference link (use
eval()
to get JSON output)
Thanks a lot for the review, let me clarify before doing anything, I think we want to move the jinja file to tabular directory, but in that case wouldn't it require additional changes in the directory itself?
nlp_prompter.generate_prompt('tabular_extractor.jinja', ... this part for example would fail because it's referencing to nlp directory's generate_prompt method etc.
Of course that can be implemented too but what I understood about your initial tabular task was to creation of a pipeline that can extract information from a tabular source, since the task I worked on is more from text --> tabular I considered it as an NLP task and placed it there.
So I'm kind of lost about what to do, can you open that up a bit?
About the second suggestion, I can add the eval() function to the notebook for sure but can't really say that I understand the first part.
I think you are right. It makes more sense to keep this in the NLP module because it's text --> tabular
1) What will the output look like if examples are not given? Can you add a default output format so that it will work without examples? For example, we can add the prompt something like this:
You are a highly intelligent and accurate tabular data extractor from plain text input, your inputs can be text of arbitrary size, but the output should be in [{'tabular': {'entity_type': 'entity'} }] JSON format
You can make it better; it's just an example.
2) sure, use eval()
to parse it easily
I think you are right. It makes more sense to keep this in the NLP module because it's
text --> tabular
- What will the output look like if examples are not given? Can you add a default output format so that it will work without examples? For example, we can add the prompt something like this:
You are a highly intelligent and accurate tabular data extractor from plain text input, your inputs can be text of arbitrary size, but the output should be in [{'tabular': {'entity_type': 'entity'} }] JSON format
You can make it better; it's just an example.
- sure, use
eval()
to parse it easily
Added both of them and pushed a new commit, also see the last cell to see how I avoided the error with eval(), can be related to https://github.com/promptslab/Promptify/issues/4 your issue here.
@monk1337 Wanted to ping about my other commit from yesterday, if is good enough maybe we can merge it already before master goes further with PRs tagged as enhanced.
Thank you @eren23, for your great contribution; merging it now.
Below I share an example to input several plain text inputs to get output in tabular format;
Examples pairs;
Input: John Doe, a 32-year-old engineer, can be reached at johndoe@email.com. Output: [{'Tabular': '{'name': 'John Doe', 'age': 32, 'occupation': 'Engineer', 'email': 'johndoe@email.com'}' }]
Input: The latest iPhone, the XS Max, is priced at $999 and comes in a 128GB version with a gold finish. Output: [{'Tabular': '{'product': 'iPhone', 'model': 'XS Max', 'price': 999, 'storage': '128GB', 'color': 'Gold'}' }]
. . .
Query Sentence;
Input: Emily Davis, a 31-year-old lawyer, can be reached at emilyd@email.com. Output:
Results;
" [{'Tabular': '{'name': 'Emily Davis', 'age': 31, 'occupation': 'Lawyer', 'email': 'emilyd@email.com'}' }]"