Closed thehapyone closed 5 months ago
Hi @thehapyone, thanks for your support.
This is a great question. Currently, our colleague is working on implementing a feature to preserve user-specified tokens, which will soon be merged into the main branch.
For now, you can address this issue by replacing with "\n\n" and using the keep_split
parameter. This will preserve all the "\n\n" separators after compression. You can manually restore the original metadata after obtaining the compressed prompt.
Hello, I faced with this issue. I have serveral contexts, but some of them filtered by llmlingua, how to determine which context id is to manual preverse original metadata.
Hi,
I was wondering is there a good way to retain metadata information of the original context information in the final compressed information? This could be useful for example in citing the source of the data or further references later on.
For example, assuming the original context looks like this
The compressed context loses that doc HTML tags which makes it very hard to track the relevant context metadata for the individual contexts.
I have attempted to use some from of hash as the metadata in the context. For example:
and the compressed prompt retain some levels of the hash added but the behaviour is not consistent and varies a lot from the underlying model used. Here, for example I have used the "openai-community/gpt2-xl" model.
Any thoughts on this?
I'm using the "longllmlingua" as the ranking method