jquesnelle / yarn

YaRN: Efficient Context Window Extension of Large Language Models
MIT License
1.32k stars 115 forks source link

An OOM error occurred while computing the perplexity of 128k Proofpoint documents with a maximum token count set to 128k. #52

Open HIT-cwh opened 6 months ago

HIT-cwh commented 6 months ago

Thank you so much for your open source work.

I evaluated the 128K context capacity of the LLaMA-27B model using an NVIDIA A100 (80G) GPU. However, I encountered an OOM error. Here is my script:

PG19="--tokenized emozilla/pg19-test-tokenized"
PROOFPILE_LONG_SMALL="--tokenized emozilla/proofpile-test-tokenized --dataset-min-tokens 131072 --samples 10 --truncate"
CUSTOM="--custom-model-together"

python eval/perplexity.py \
    ${PROOFPILE_LONG_SMALL} ${CUSTOM} \
    --output-file data/proofpile-long-small.csv \
    --min-tokens 131072 --max-tokens 131072 --tokens-step 2048 --aggressive-memory \
    -m llama2_7b_yarn_64k

image

Zhangchaoran000 commented 2 months ago

I encountered the same issue! When evaluating the perplexity of a 32K text using a 40GB A100 GPU, I faced an out-of-memory (OOM) problem. Do you know of any solutions or optimization measures?