Open HIT-cwh opened 6 months ago
Thank you so much for your open source work.
I evaluated the 128K context capacity of the LLaMA-27B model using an NVIDIA A100 (80G) GPU. However, I encountered an OOM error. Here is my script:
PG19="--tokenized emozilla/pg19-test-tokenized" PROOFPILE_LONG_SMALL="--tokenized emozilla/proofpile-test-tokenized --dataset-min-tokens 131072 --samples 10 --truncate" CUSTOM="--custom-model-together" python eval/perplexity.py \ ${PROOFPILE_LONG_SMALL} ${CUSTOM} \ --output-file data/proofpile-long-small.csv \ --min-tokens 131072 --max-tokens 131072 --tokens-step 2048 --aggressive-memory \ -m llama2_7b_yarn_64k
I encountered the same issue! When evaluating the perplexity of a 32K text using a 40GB A100 GPU, I faced an out-of-memory (OOM) problem. Do you know of any solutions or optimization measures?
Thank you so much for your open source work.
I evaluated the 128K context capacity of the LLaMA-27B model using an NVIDIA A100 (80G) GPU. However, I encountered an OOM error. Here is my script: