The underlying GPT-3.5 model was trained on the internet, so it has a tendency to use match() and filter_labels() view stages even when the label class does not actually exist in the dataset. This patches this by
identifying any view stage strings with match() and filter_labels() that contains a named entity which was recognized in the label class NER link but never matched up to a class name for the label field.
if a text_similarity brain run was identified in the required_runs, this is used. Otherwise, we use the first available text similarity run we can find.
swap out the erroneous view stage for a sort_by_similarity() stage with that unmatched entity and brain key, using a default of 25 for the number of samples to return. We can change this if we'd like, but I'd prefer just messaging the user about this and asking them to give a prompt that sorts by similarity or specifies the number of results.
Example scenario where this would fix the problem:
import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz
from gpt_view_generator import ask_gpt
dataset = foz.load_zoo_dataset("quickstart")
fob.compute_similarity(
dataset,
model='clip-vit-base32-torch`,
brain_key='clip'
)
view = ask_gpt(dataset, query = "images of elks")
Will result in ...
Okay, I'm going to load dataset.sort_by_similarity('elk', brain_key = 'clip', k = 25)
The underlying GPT-3.5 model was trained on the internet, so it has a tendency to use
match()
andfilter_labels()
view stages even when the label class does not actually exist in the dataset. This patches this bymatch()
andfilter_labels()
that contains a named entity which was recognized in the label class NER link but never matched up to a class name for the label field.required_runs
, this is used. Otherwise, we use the first available text similarity run we can find.sort_by_similarity()
stage with that unmatched entity and brain key, using a default of25
for the number of samples to return. We can change this if we'd like, but I'd prefer just messaging the user about this and asking them to give a prompt that sorts by similarity or specifies the number of results.Example scenario where this would fix the problem:
Will result in ... Okay, I'm going to load
dataset.sort_by_similarity('elk', brain_key = 'clip', k = 25)