yethee / tiktoken-php

This is a port of the tiktoken
MIT License
92 stars 19 forks source link

No rank for bytes vector: [33 #8

Open luomo-pro opened 7 months ago

luomo-pro commented 7 months ago

Hi, I just had feedback from my users that they are experiencing this error: [ error ] [0]No rank for bytes vector: [33][src/Vocab/Vocab.php:120] Since this is a problem found by a user who did not save the original text, I don't know exactly what text is causing this problem, I'm very sorry. I looked at the source code and it should be this code that is causing the error to be reported: return $this->tokenToRankMap[EncodeUtil::fromBytes($bytes)] ???? throw new OutOfBoundsException(sprintf( 'No rank for bytes vector: [%s]', implode(', ', $bytes), the I'm wondering if there is any way to return a default value or not let it report an error when it encounters an error like this? Thank you.

yethee commented 7 months ago

I'm wondering if there is any way to return a default value or not let it report an error when it encounters an error like this?

You can use Vocab::tryGetRank() method instead of Vocab::getRank from user code, to avoid throw an exception.

In context Encoder::encode() we cannot ignore such errors. Since this can lead to an unpredictable result (it will be impossible to decode tokens back to a text).

The reason for the error may be:

yethee commented 7 months ago

Please submit a stack trace, used encoding and the input text to encode when the error occurs again.