nvtransfer / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
646 stars 43 forks source link

Performance Differences in Qwen2-72B-Instruct-131k #56

Closed lwang2070 closed 2 months ago

lwang2070 commented 2 months ago

Dear author,

Thanks a bunch for the invaluable benchmark you created! I have used the benchmark to evaluate Qwen2-72B-Instruct-131k model, but noticed a significant different between the result I obtained and the value listed in README.md. More precisely, I obtained zero score for Qwen2 series models on all tasks exceeding 32k (their training length), with Yarn scaling enabled as suggested on their model card page. Here are all my configs:

config.json:

{
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 8192,
  "initializer_range": 0.02,
  "intermediate_size": 29568,
  "max_position_embeddings": 32768,
  "max_window_layers": 80,
  "model_type": "qwen2",
  "num_attention_heads": 64,
  "num_hidden_layers": 80,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.40.1",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064,
  "rope_scaling": {
    "factor": 4.0,
    "original_max_position_embeddings": 32768,
    "type": "yarn"
  }
}

config_models.sh:

        qwen2-72b-instruct-131k)
            MODEL_PATH="${MODEL_DIR}/Qwen/Qwen2-72B-Instruct-131k"
            MODEL_TEMPLATE_TYPE="qwen2"
            MODEL_FRAMEWORK="vllm"
            ;;

template.py:

 "qwen2": "<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\n{task_template}<|im_end|>\n<|im_start|>assistant\n"

Any suggestions for possible mistakes I have made? Thanks in advance!

hsiehjackson commented 2 months ago

If you are using vLLM, you can increase max_position_embeddings to 131072 in config.json. By default, I remember vLLM will ignore the request if input sequence length > max_position_embeddings.

lwang2070 commented 2 months ago

Ahh I see! I'll close issue after confirming the fix:)