Open aaronbriel opened 1 week ago
Hi @aaronbriel ,
The optimized_program currently includes few-shot examples from only 8 of the classifiers because the BootstrapWithRandomSearch configuration is set to select:
"max_bootstrapped_demos": 8, "max_labeled_demos": 8,"
To get unique few-shot examples for all 41 classifiers, you can increase these parameters to 41.
However, note that the selection of fewshot examples in BootstrapFewShot doesn't guarantee uniqueness in all 41 few-shot demos (the optimizer just selects a set of 41 few-shots that pass the metric):
Some potential solutions for this could be:
Adjusting the metric to have a global check for each unique classifier, modifying the validate_answer
function to ensure that only examples unique to each classifer are selected and not repeated (e.g. - return example.intent.lower() == pred.intent.lower() and global_class_check(example)
Filtering the train_dataset by the 41 classifier types, and then running the optimizer on each of the 41 train_sets (bootstrapping 41x!)
bootstrap_program_0 = teleprompter.compile(IntentClassifierModule(), trainset=train_dataset_0)
bootstrap_program_1 = teleprompter.compile(bootstrap_program_0, trainset=train_dataset_1)
the 2nd solution is likely more expensive but may ensure some more diversity by providing multiple sets of few-shots for the unique classifiers, which can potentially raise performance
Let me know if this helps!
@arnavsinghvi11 thank for the quick response! I will try this and let you know the results. Thanks!
@arnavsinghvi11 I keep running into the error below. I thought I had resolved it by adding format=str
to each of the signature InputFields. It progressed a bit further but failed yet again several intent iterations later. I'm not seeing anything that jumps out in the data for that specific intent, as all of the text data across all intents contain special characters.
Do you know of any other tricks people have used to resolve this?
Traceback (most recent call last):
File "/home/ubuntu/repos/project/experiments/dspy/build_intent_classifier_prompt.py", line 260, in <module>
optimize_intent_classifier()
File "/home/ubuntu/repos/project/experiments/dspy/build_intent_classifier_prompt.py", line 237, in optimize_intent_classifier
bootstrap_program = teleprompter.compile(bootstrap_program, trainset=training_data_intent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/project-venv/lib/python3.12/site-packages/dspy/teleprompt/random_search.py", line 95, in compile
program2 = program.compile(student, teacher=teacher, trainset=trainset2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/project-venv/lib/python3.12/site-packages/dspy/teleprompt/bootstrap.py", line 82, in compile
self._prepare_student_and_teacher(student, teacher)
File "/home/ubuntu/anaconda3/envs/project-venv/lib/python3.12/site-packages/dspy/teleprompt/bootstrap.py", line 99, in _prepare_student_and_teacher
assert getattr(self.student, "_compiled", False) is False, "Student must be uncompiled."
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Student must be uncompiled.
Using the recommended solution in (1) above, the resulting prompt was still missing 20 intents so that is not a feasible solution for a production release. The "Student must be uncompiled" issue may have not occurred due to certain data in a missed intent not being encountered.
I'm going to have to hold off on leveraging this tool until I or another person can find a solution to said issue.
@aaronbriel this may be helpful https://github.com/KarelDO/xmc.dspy
I followed the tutorials for optimizing a DSPy program for the task of multi-class classification and the "optimized" prompt resulted in a small subset of the available classifiers, making it unsuitable for consideration in a production environment.
I'll provide the relevant chunks of notebook code but I won't be able to actually show the prompt itself as it contains production data. Hopefully this is sufficient for identification of what may be the issue.
ISSUE 1: The main issue is that the final "optimized" prompt only contains single few-shot samples for 8 of the 41 classifiers (with one of the classifiers having 2 samples). I expected it to contain multiple few-shot samples for each of the 41 classifiers.
ISSUE 2: The secondary issue was that the evaluation metric showed a rather low score of 64.34. I expected this to be much higher since I trained with a decent size ground truth dataset (that was manually curated for accuracy) of 50 samples per classifier.
I'm guessing this is related to my optimizer configuration but I'm not sure what to adjust. Please advise. Thank you!
This resulted in successful "training", running in 8 sets. I then completed an evaluation:
I then checked the optimized prompt by doing:
ISSUE 1: The resulting
optimized_intent_classifier.json
had single few-shot samples for only 8 intents, with one of the intents having 2 samples. There are 41 intents, so I expected multiple few-shot samples for each of the 41 intents.ISSUE 2: This showed a final score of 64.34, which was admittedly far lower than expected as I provided a ground truth dataset of 50 samples per intent.