Open CyrusOfEden opened 6 days ago
Some preliminary results on the hover_retrieve_discrete
optimizer test
config = {
"max_bootstrapped_demos": 64,
"max_labeled_demos": 16,
"max_errors": 10,
"num_candidate_programs": 16,
}
Best program had 8 static few shots and 8 KNN'd few shots
Average train score: 37.316
Scores so far: [30.0, 30.0, 37.5, 37.0, 39.5, 38.0, 38.5, 37.0, 39.0, 39.0, 39.5, 37.0, 38.0, 41.0, 38.0, 38.0, 36.5, 38.0, 37.5]
Best score so far: 41.0
19 candidate programs found.
Optimized train score...
Average Metric: 82.00 / 200 (41.0%): 100%|██████████| 200/200 [00:23<00:00, 8.38it/s]
2024/11/22 20:48:42 INFO dspy.evaluate.evaluate: Average Metric: 82 / 200 (41.0%)
Optimized dev score...
Average Metric: 46.00 / 100 (46.0%): 100%|██████████| 100/100 [02:05<00:00, 1.26s/it]
2024/11/22 20:50:48 INFO dspy.evaluate.evaluate: Average Metric: 46 / 100 (46.0%)
Optimized test score...
Average Metric: 94.00 / 200 (47.0%): 100%|██████████| 200/200 [03:59<00:00, 1.20s/it]
2024/11/22 20:54:47 INFO dspy.evaluate.evaluate: Average Metric: 94 / 200 (47.0%)
Average train score: 36.078947368421055
Scores so far: [30.0, 30.0, 34.5, 41.0, 33.0, 38.5, 37.5, 36.0, 38.0, 35.0, 36.0, 35.5, 38.0, 36.0, 36.5, 41.0, 35.5, 39.0, 34.5]
Best score so far: 41.0
19 candidate programs found.
Optimized train score...
Average Metric: 82.00 / 200 (41.0%): 100%|██████████| 200/200 [00:00<00:00, 677.59it/s]
2024/11/22 22:47:40 INFO dspy.evaluate.evaluate: Average Metric: 82 / 200 (41.0%)
Optimized dev score...
Average Metric: 45.00 / 100 (45.0%): 100%|██████████| 100/100 [02:14<00:00, 1.35s/it]
2024/11/22 22:49:55 INFO dspy.evaluate.evaluate: Average Metric: 45 / 100 (45.0%)
Optimized test score...
Average Metric: 86.00 / 200 (43.0%): 100%|██████████| 200/200 [04:29<00:00, 1.35s/it]
2024/11/22 22:54:24 INFO dspy.evaluate.evaluate: Average Metric: 86 / 200 (43.0%)
Excited to see this merge.
Current KNNFewShot
Compile Time: Vectorize trainset examples Test Time:
New BootstrapKNN
Compile Time: Run BootstrapFewShot to collect traces and generate end-to-end demo sets Test Time: When a predictor is called, KNN of the input are few-shotted using the augmented demos for that predictor
New BootstrapKNNWithRandomSearch
Compile Time:
Test Time: When a predictor is called, KNN of the input are few-shotted using the augmented demos for that predictor. If
num_static_demos ≠ 0
, then that predictor's demos are shuffled(static demos + knn demos) such thatlen(static demos + knn demos) == max_labeled_demos