liyucheng09 / Selective_Context

Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.
310 stars 14 forks source link

Which model is used in the paper to compute self-information? #17

Closed XiaoFengbing closed 10 months ago

XiaoFengbing commented 10 months ago

Hi, thanks for your very interesting work for CONTEXT COMPRESSION!

I want to know which model is used in the paper to compute self-information. For LLaMA family, this paper uses meta-llama/Llama-2-7b ? (https://huggingface.co/meta-llama/Llama-2-7b)

Another simple question, in this paper, we only compress the context/demonstration/document (and instruction if it exists), meanwhile not compress the question/query??? In other words, we only input the all contents from a prompt to compress except its question/query?

Thank YOU!

liyucheng09 commented 10 months ago
  1. For llama experiment in the paper, I think huggyllama/llama-7b it's the right one.
  2. You're correct. In the paper, I only send context/passage for the compression. But you're free to try inlcude query in your experiments.