Add partial decoding function for improved streaming functionality

lopez86 commented 1 year ago

Currently, for streaming, we allow for the user to input a starting LM state. This might work ok in some cases but there are some issues:

We start from an empty beam every time - this is fine if there is not much correlation from chunk to chunk, but for streaming we ideally want a mode where the logits can be split up without affecting the results. So, we need an endpoint where the full beam information is included and the scoring caches are preserved between calls until the user clears them.

The code to use this will be a bit different since the user has to do more management of state objects. An alternate approach would be to save the state within the decoder object so that the user just has to call initialize and clear functions or using kwargs in the partial decode function - I'm not sure what's better here

This issue was pointed out a while ago in an issue but hasn't been addressed yet.

lopez86 commented 1 year ago

Waiting on this PR to fix CI and unblock this

lopez86 commented 1 year ago

Waiting on this PR to fix CI and unblock this

Looks like it's ready to go now

kensho-technologies / pyctcdecode

Add partial decoding function for improved streaming functionality #106