lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.44k stars 4.49k forks source link

Fastchat fails with a LLama 2 based model #2075

Closed sergsb closed 1 year ago

sergsb commented 1 year ago

Dear all,

I am trying to run stabilityai/FreeWilly2 (LLaMA 2 70B) model. When I run any request to the model I have:

 /home/sergeys/miniconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py │
  │ :292 in forward                                                                                  │
  │                                                                                                  │
  │   289 │   │   hidden_states = self.input_layernorm(hidden_states)                                │
  │   290 │   │                                                                                      │
  │   291 │   │   # Self Attention                                                                   │
  │ ❱ 292 │   │   hidden_states, self_attn_weights, present_key_value = self.self_attn(              │
  │   293 │   │   │   hidden_states=hidden_states,                                                   │
  │   294 │   │   │   attention_mask=attention_mask,                                                 │
  │   295 │   │   │   position_ids=position_ids,                                                     │
  │                                                                                                  │
  │ /home/sergeys/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in          │
  │ _call_impl                                                                                       │
  │                                                                                                  │
  │   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
  │   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
  │   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
  │ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
  │   1502 │   │   # Do not call functions when jit is used                                          │
  │   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
  │   1504 │   │   backward_pre_hooks = []                                                           │
  │                                                                                                  │
  │ /home/sergeys/miniconda3/lib/python3.9/site-packages/accelerate/hooks.py:165 in new_forward      │
  │                                                                                                  │
  │   162 │   │   │   with torch.no_grad():                                                          │
  │   163 │   │   │   │   output = old_forward(*args, **kwargs)                                      │
  │   164 │   │   else:                                                                              │
  │ ❱ 165 │   │   │   output = old_forward(*args, **kwargs)                                          │
  │   166 │   │   return module._hf_hook.post_forward(module, output)                                │
  │   167 │                                                                                          │
  │   168 │   module.forward = new_forward                                                           │
  │                                                                                                  │
  │ /home/sergeys/miniconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py │
  │ :197 in forward                                                                                  │
  │                                                                                                  │
  │   194 │   │   bsz, q_len, _ = hidden_states.size()                                               │
  │   195 │   │                                                                                      │
  │   196 │   │   query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.   │
  │ ❱ 197 │   │   key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.he   │
  │   198 │   │   value_states = self.v_proj(hidden_states).view(bsz, q_len, self.num_heads, self.   │
  │   199 │   │                                                                                      │
  │   200 │   │   kv_seq_len = key_states.shape[-2]                                                  │
  ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
  RuntimeError: shape '[1, 577, 64, 128]' is invalid for input of size 590848

How can one fix it?

surak commented 1 year ago

I had this error before, I updated the repo and gave 7 gpus for it.

And I had some issues with NCCL, so I use export NCCL_P2P_DISABLE=1 # 3090s do not support p2p

Other than that, it's pretty much the same:

srun python3 $FASTCHAT_DIR/fastchat/serve/model_worker.py \
     --controller http://controller-server-address:21001 \
     --port 31020 --worker http://$(hostname):31020 \
     --num-gpus 7 \
     --host 0.0.0.0 \
     --model-path /$FASTCHAT_DIR/models/FreeWilly2
sergsb commented 1 year ago

@surak -- As I found, only updating to the latest version from GitHub works. The pip version can not work with FreeWilly2 for a reason.

surak commented 1 year ago

Maybe they haven't released it on pip yet.

ssmi153 commented 1 year ago

@surak -- As I found, only updating to the latest version from GitHub works. The pip version can not work with FreeWilly2 for a reason.

@sergsb , what exactly did you update? Accelerate? Transformers? Fastchat? I'm getting the same error, so any further guidance would be much appreciated.

sergsb commented 1 year ago

@surak -- As I found, only updating to the latest version from GitHub works. The pip version can not work with FreeWilly2 for a reason.

@sergsb , what exactly did you update? Accelerate? Transformers? Fastchat? I'm getting the same error, so any further guidance would be much appreciated.

I updated fastchat pip install git+https//path to repo

ssmi153 commented 1 year ago

Thanks :)

surak commented 1 year ago

Let's close this one and let the developers worry about other bugs, then? :-)