microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs
https://aka.ms/GeneralAI
MIT License
3.6k stars 274 forks source link

add a patch for the intergration of qwen and qwen_parallel into minillm #144

Closed SleepEarlyLiveLong closed 9 months ago

SleepEarlyLiveLong commented 9 months ago

Add a patch for the PR yesterday (https://github.com/microsoft/LMOps/pull/143)

  1. minillm/tools/process_data_dolly.py: add a prompt template for qwen; use dtype=uint32 instead of uint16 when initialize binary_builder considering the fact that vocab_size of qwen (151936) beyonds the range of uint16 (<= 65535).
  2. minillm/tools/process_data_pretrain.py: use dtype=uint32 instead of uint16 when initialize train_binary_builder and valid_binary_builder based on the same reason as above.
  3. minillm/minillm/pipelines.py: exclude qwen from the '65535 condition' (line49 and line143) because token_id=65535 has a solid meaning in qwen's tokenizer.
  4. minillm/transformers/src/transformers/models/qwen/modeling_qwen.py and minillm/transformers/src/transformers/models/qwen_parallel/modeling_qwen_paralle.py: following the advice from https://github.com/microsoft/LMOps/issues/130#issuecomment-1866089740, I upcast attention to fp32 for more accurate calculations.

BTW, although using fp32, the issue mentioned in #issue130 remains unsolved but I'm actively working on it and hope to have a solution in the near future. Onced sloved, I will submit a new PR.