A library to synthesize text datasets using Large Language Models (LLM). Mutate reads through the examples in the dataset and generates similar examples using auto generated few shot prompts.
pip install mutate-nlp
or
pip install git+https://github.com/infinitylogesh/mutate
from mutate import pipeline
pipe = pipeline("text-classification-synthesis",
model="EleutherAI/gpt-neo-2.7B",
device=1)
task_desc = "Each item in the following contains movie reviews and corresponding sentiments. Possible sentimets are neg and pos"
# returns a python generator
text_synth_gen = pipe("csv",
data_files=["local/path/sentiment_classfication.csv"],
task_desc=task_desc,
text_column="text",
label_column="label",
text_column_alias="Comment",
label_column_alias="sentiment",
shot_count=5,
class_names=["pos","neg"])
#Loop through the generator to synthesize examples by class
for synthesized_examples in text_synth_gen:
print(synthesized_examples)
Under the hood Mutate uses the wonderful 🤗 datasets library for dataset processing, So it supports 🤗 datasets out of the box.
from mutate import pipeline
pipe = pipeline("text-classification-synthesis",
model="EleutherAI/gpt-neo-2.7B",
device=1)
task_desc = "Each item in the following contains customer service queries expressing the mentioned intent"
synthesizerGen = pipe("banking77",
task_desc=task_desc,
text_column="text",
label_column="label",
# if the `text_column` doesn't have a meaningful value
text_column_alias="Queries",
label_column_alias="Intent", # if the `label_column` doesn't have a meaningful value
shot_count=5,
dataset_args=["en"])
for exp in synthesizerGen:
print(exp)
Caution: Infinetly looping through the dataset has a higher chance of duplicate examples to be generated.
from mutate import pipeline
pipe = pipeline("text-classification-synthesis",
model="EleutherAI/gpt-neo-2.7B",
device=1)
task_desc = "Each item in the following contains movie reviews and corresponding sentiments. Possible sentimets are neg and pos"
# returns a python generator
text_synth_gen = pipe("csv",
data_files=["local/path/sentiment_classfication.csv"],
task_desc=task_desc,
text_column="text",
label_column="label",
text_column_alias="Comment",
label_column_alias="sentiment",
class_names=["pos","neg"],
# Flag to generate indefinite examples
infinite_loop=True)
#Infinite loop
for exp in synthesizerGen:
print(exp)
The Idea of generating examples from Large Language Model is inspired by the works below,