This is the biggest corner that I cut in this implementation and its blocking some nice stuff:
intelligent pruning of suggestion beams when there are too many. auto-raising the Multiplier and use the new thresholds to cut off beams until we have the number we want.
currently we do a post-processing greedy string prefix finding step prior to visualization to recover the forked streams, this would no longer be necessary if we could access the underlying streams directly. this would enable use of colors to indicate high/low probability tokens in the visualization, but at the expense of bleeding through a lot more tokenization details then the current visualization does so maybe its wise to keep both methods around.
This is the biggest corner that I cut in this implementation and its blocking some nice stuff:
Multiplier
and use the new thresholds to cut off beams until we have the number we want.