Fastchat fails with a LLama 2 based model

sergsb commented 1 year ago

Dear all,

I am trying to run stabilityai/FreeWilly2 (LLaMA 2 70B) model. When I run any request to the model I have:

 /home/sergeys/miniconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py │
  │ :292 in forward                                                                                  │
  │                                                                                                  │
  │   289 │   │   hidden_states = self.input_layernorm(hidden_states)                                │
  │   290 │   │                                                                                      │
  │   291 │   │   # Self Attention                                                                   │
  │ ❱ 292 │   │   hidden_states, self_attn_weights, present_key_value = self.self_attn(              │
  │   293 │   │   │   hidden_states=hidden_states,                                                   │
  │   294 │   │   │   attention_mask=attention_mask,                                                 │
  │   295 │   │   │   position_ids=position_ids,                                                     │
  │                                                                                                  │
  │ /home/sergeys/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in          │
  │ _call_impl                                                                                       │
  │                                                                                                  │
  │   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
  │   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
  │   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
  │ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
  │   1502 │   │   # Do not call functions when jit is used                                          │
  │   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
  │   1504 │   │   backward_pre_hooks = []                                                           │
  │                                                                                                  │
  │ /home/sergeys/miniconda3/lib/python3.9/site-packages/accelerate/hooks.py:165 in new_forward      │
  │                                                                                                  │
  │   162 │   │   │   with torch.no_grad():                                                          │
  │   163 │   │   │   │   output = old_forward(*args, **kwargs)                                      │
  │   164 │   │   else:                                                                              │
  │ ❱ 165 │   │   │   output = old_forward(*args, **kwargs)                                          │
  │   166 │   │   return module._hf_hook.post_forward(module, output)                                │
  │   167 │                                                                                          │
  │   168 │   module.forward = new_forward                                                           │
  │                                                                                                  │
  │ /home/sergeys/miniconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py │
  │ :197 in forward                                                                                  │
  │                                                                                                  │
  │   194 │   │   bsz, q_len, _ = hidden_states.size()                                               │
  │   195 │   │                                                                                      │
  │   196 │   │   query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.   │
  │ ❱ 197 │   │   key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.he   │
  │   198 │   │   value_states = self.v_proj(hidden_states).view(bsz, q_len, self.num_heads, self.   │
  │   199 │   │                                                                                      │
  │   200 │   │   kv_seq_len = key_states.shape[-2]                                                  │
  ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
  RuntimeError: shape '[1, 577, 64, 128]' is invalid for input of size 590848

How can one fix it?

surak commented 1 year ago

I had this error before, I updated the repo and gave 7 gpus for it.

And I had some issues with NCCL, so I use export NCCL_P2P_DISABLE=1 # 3090s do not support p2p

Other than that, it's pretty much the same:

srun python3 $FASTCHAT_DIR/fastchat/serve/model_worker.py \
     --controller http://controller-server-address:21001 \
     --port 31020 --worker http://$(hostname):31020 \
     --num-gpus 7 \
     --host 0.0.0.0 \
     --model-path /$FASTCHAT_DIR/models/FreeWilly2

sergsb commented 1 year ago

@surak -- As I found, only updating to the latest version from GitHub works. The pip version can not work with FreeWilly2 for a reason.

surak commented 1 year ago

Maybe they haven't released it on pip yet.

ssmi153 commented 1 year ago

@surak -- As I found, only updating to the latest version from GitHub works. The pip version can not work with FreeWilly2 for a reason.

@sergsb , what exactly did you update? Accelerate? Transformers? Fastchat? I'm getting the same error, so any further guidance would be much appreciated.

sergsb commented 1 year ago

@surak -- As I found, only updating to the latest version from GitHub works. The pip version can not work with FreeWilly2 for a reason.

@sergsb , what exactly did you update? Accelerate? Transformers? Fastchat? I'm getting the same error, so any further guidance would be much appreciated.

I updated fastchat pip install git+https//path to repo

ssmi153 commented 1 year ago

Thanks :)

surak commented 1 year ago

Let's close this one and let the developers worry about other bugs, then? :-)

lm-sys / FastChat

Fastchat fails with a LLama 2 based model #2075