Closed koaning closed 1 year ago
Then the Prodigy recipe might use something like:
from prodigy.sorters import ExpMovingAverage, prefer_low_scores
def make_scored_stream(stream, anchors):
for batch in batched(stream):
batch_text = [b['text'] for b in batch]
distances = calc_distance(batch_text, anchors, pipeline)
for score, ex in zip(distances, batch):
yield score, ex
def sorted_stream(stream):
return prefer_low_scores(ExpMovingAverage(stream))
Worth rethinking though. Something about recalculating the anchors feels a bit wasteful.
Something like this: