Closed cppxaxa closed 1 year ago
Thanks for the report! This is an implementation detail of tiktoken that is subject to change (as opposed to the public encoding_for_model API); can you link me to why haystack needs access to these?
Thanks for the quick response. Unfortunately, I don't have any links on why haystack needs it in their library.
Another update: The haystack tutorial started to work again. Seems haystack is now applying some workarounds on the version of tiktoken. You can close the issue for now.
Colab logs: Collecting tiktoken<0.5.0,>=0.3.2 (from farm-haystack[colab,inference]) Downloading tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
The place where tiktoken is used haystack: https://github.com/deepset-ai/haystack/blob/94c5d6d216cfad5a06484e17614be72a7c2d5af1/haystack/utils/openai_utils.py#L10
Okay, I see. I should add an encoding_name_for_model
, in case they're trying to do the lookup without instantiating encodings. I'll add this API later today and revert the constant names so cross compatibility is easier.
Fixed in 0.5.1, opened a PR against haystack at https://github.com/deepset-ai/haystack/pull/5785 , thanks for reporting!
Thank you @hauntsaninja for taking care of the issue so quickly! :raised_hands:
Problem
As of today, the haystack python library makes use of the constants "MODEL_TO_ENCODING" and "MODEL_PREFIX_TO_ENCODING" from the tiktoken library and with the tiktoken 0.5.0 version, it breaks.
Workaround
With haystack, we need to use tiktoken 0.4.0 instead of 0.5.0
Fix requested
Can we revert the variable name from the following commit? https://github.com/openai/tiktoken/commit/a7937793e7460ae8e815f506600b7248fad2a803#diff-0d973848bd229418209db2c46c86167000845592ca6b98fad215c21c317bc494
Steps to repro the issue
Run the haystack tutorial in Google colab with T4 GPU runtime: https://haystack.deepset.ai/tutorials/01_basic_qa_pipeline