Open choprahetarth opened 8 months ago
Hi @choprahetarth, thank you for your interest in and support of LLMLingua.
This is a known issue, as seen in #4. We'll address it soon as detailed in #51.
Is there anything I can contribute to? I seem to be interested in it quite a lot. My stack is around Python/ML/PyTorch, but I am not sure which issue to pick first.
While the concept is promising, especially for High Token Languages like Japanese, I've encountered a significant encoding issue.
Steps to Reproduce: Input a Japanese text prompt into LLMLingua for compression. Observe the output, which should be a compressed version of the original prompt. Expected Behavior: The compressed output should retain the original Japanese characters without any encoding errors.
Actual Behavior: The output contains a mix of unrecognized characters along with some correct Japanese script. This mixed encoding makes the compressed prompt unusable when passed into GPT-4.