kensho-technologies / pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.
Apache License 2.0
416 stars 89 forks source link

What does score_partial_token of HotwordsScorer do ? #76

Open Hubert-Bonisseur opened 2 years ago

Hubert-Bonisseur commented 2 years ago

I'm looking at modifying the base HotwordsScorer to boost short sentences instead of just individual words. But I fail to understand what the score_partial_token function does. The comment in the code seems to have been copy pasted from the score function and does not help:

def score(self, text: str) -> float:
    """Get total hotword score for input text."""
    return self._weight * len(self._match_ptn.findall(text))

def score_partial_token(self, token: str) -> float:
    """Get total hotword score for input text."""
    if token in self:
        # find shortest unigram starting with the given partial token
        min_len = len(next(self._char_trie.iterkeys(token, shallow=True)))
        # scale score by length of unigram matched so far
        score = self._weight * len(token) / min_len
    else:
        score = 0.0
    return score