microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.49k stars 251 forks source link

Output for High Token Languages like Japanese #63

Open choprahetarth opened 8 months ago

choprahetarth commented 8 months ago

While the concept is promising, especially for High Token Languages like Japanese, I've encountered a significant encoding issue.

Steps to Reproduce: Input a Japanese text prompt into LLMLingua for compression. Observe the output, which should be a compressed version of the original prompt. Expected Behavior: The compressed output should retain the original Japanese characters without any encoding errors.

Actual Behavior: The output contains a mix of unrecognized characters along with some correct Japanese script. This mixed encoding makes the compressed prompt unusable when passed into GPT-4. A B

iofu728 commented 8 months ago

Hi @choprahetarth, thank you for your interest in and support of LLMLingua.

This is a known issue, as seen in #4. We'll address it soon as detailed in #51.

choprahetarth commented 8 months ago

Is there anything I can contribute to? I seem to be interested in it quite a lot. My stack is around Python/ML/PyTorch, but I am not sure which issue to pick first.