tryolabs / restricttotopic

Validator for GuardrailsHub to check if a text is related with a topic.
Apache License 2.0
1 stars 5 forks source link

Request: Separate thresholds for valid topics and invalid topics. #13

Open JosephCatrambone opened 2 months ago

JosephCatrambone commented 2 months ago

As of writing, there's only one threshold for the zero-shot topics that's used as a cutoff for whether a topic is considered 'found' or not. Having separate thresholds for the positive and negative side of the equation would allow for us to perform more nuanced filtering, like: "It might not be about sports, but it's definitely not about travel."

Consider the case where our threshold is 0.5, the default. If we assume the false-positive rate here 4%[1] then adding ten negative topics means our odds of accidentally flagging something is 1-((1-0.04)...(1-0.04)), or 33%.

It would be nice to be able to tune that.

I imagine the change would be something akin to:

        candidate_topics = model_input["valid_topics"] + model_input["invalid_topics"]
        thresholds = [self._zero_shot_threshold_valid]*len(model_input["valid_topics"]) + [self._zero_shot_threshold_invalid]*len(model_input["invalid_topics"])

        result = self._classifier(text, candidate_topics)
        topics = result["labels"]
        scores = result["scores"]
        found_topics = []
        for topic, score, threshold in zip(topics, scores, thresholds):
            if score > threshold:
                found_topics.append(topic)

[1] Source: lost the original link so the new source is 'trust me, friendo'.

JosephCatrambone commented 2 months ago

I'm not sure if this merits a separate discussion, but was the default threshold originally selected to optimize for fewer false negatives to more readily defer to GPT or was it picked as an overall optimal?