pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.34k stars 484 forks source link

Making TokenizerInterface more usable for the user's code. #170

Open Artyom17 opened 2 months ago

Artyom17 commented 2 months ago

Adding id_to_piece, piece_to_id and is_special_token functionality to TokenizerInterface and the corresponding implementations. Thus, the interface can be used by user's code to encode/decode single tokens. These new functions are not directly used by the gpt-fast code.