knuddelsgmbh / jtokkit

JTokkit is a Java tokenizer library designed for use with OpenAI models.
https://jtokkit.knuddels.de/
MIT License
553 stars 42 forks source link

Return position to EncodingResult (#80) #97

Closed imsosleepy closed 2 months ago

imsosleepy commented 4 months ago

Implementation for issue #80

I added the lastTokenIndex variable to the EncodingResult class to know the last position of the text truncated by max token.

I only created a simple validation code to test this change. For a complete test, I think need to add one more column to the CSV file for the test and put in the appropriate position values.

And I need to review if lastTokenIndex is a proper variable name and if it reflects the proper requirements for the issue.

Plexcalibur commented 2 months ago

Thanks for the input. Some minor changes where still needed, so I did the change myself based on your code. See https://github.com/knuddelsgmbh/jtokkit/commit/e0bab9af214438ae130d492b61d7c925b131993a