This PR changes the way data is passed back from C to rust, primarily to fix a heap overflow. The out vec is currently allocated with capacity = the number of tokens to predict, however the data copied into out is the full detokenized string, which always has length > the number of tokens. This means every predict() call with a specified token count overflows the out vec's heap allocation.
This PR changes the way data is passed back from C to rust, primarily to fix a heap overflow. The
out
vec is currently allocated with capacity = the number of tokens to predict, however the data copied intoout
is the full detokenized string, which always has length > the number of tokens. This means everypredict()
call with a specified token count overflows theout
vec's heap allocation.