turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.19k stars 234 forks source link

Issue with dolphin mixtral8x22b #445

Open luijait opened 1 month ago

luijait commented 1 month ago

(ProxyActor pid=110747) async for chunk in self.body_iterator: (ProxyActor pid=110747) File "/usr/local/lib/python3.10/dist-packages/ray/serve/handle.py", line 781, in anext (ProxyActor pid=110747) return await next_obj_ref (ProxyActor pid=110747) ray.exceptions.RayTaskError(IndexError): ray::ServeReplica:default:ZerodAI_PreProd_Pruebas.handle_request_streaming() (pid=110786, ip=192.168.1.3, actor_id=149ba7939709b57a91dc742601000000, repr=<ray.serve._private.replica.ServeReplica:default:ZerodAI_PreProd_Pruebas object at 0x7f993502d870>) (ProxyActor pid=110747) async for result in generator: (ProxyActor pid=110747) File "/usr/local/lib/python3.10/dist-packages/ray/serve/_private/replica.py", line 952, in call_user_method_generator (ProxyActor pid=110747) for result in result_generator: (ProxyActor pid=110747) File "/home/omegaleitatadmin/exllamav2/./api.py", line 107, in generate_stream (ProxyActor pid=110747) generator.begin_stream(context, settings) (ProxyActor pid=110747) File "/home/omegaleitatadmin/exllamav2/exllamav2/generator/streaming.py", line 198, in begin_stream (ProxyActor pid=110747) self.begin_stream_ex(input_ids, (ProxyActor pid=110747) File "/home/omegaleitatadmin/exllamav2/exllamav2/generator/streaming.py", line 296, in begin_stream_ex (ProxyActor pid=110747) self._gen_begin_reuse(input_ids, gen_settings) (ProxyActor pid=110747) File "/home/omegaleitatadmin/exllamav2/exllamav2/generator/streaming.py", line 624, in _gen_begin_reuse (ProxyActor pid=110747) self._gen_begin(in_tokens, gen_settings) (ProxyActor pid=110747) File "/home/omegaleitatadmin/exllamav2/exllamav2/generator/streaming.py", line 586, in _gen_begin (ProxyActor pid=110747) self.model.forward(self.sequence_ids[:, :-1], (ProxyActor pid=110747) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context (ProxyActor pid=110747) return func(*args, *kwargs) (ProxyActor pid=110747) File "/home/omegaleitatadmin/exllamav2/exllamav2/model.py", line 694, in forward (ProxyActor pid=110747) r, ls = self._forward(input_ids = input_ids[:, chunk_begin : chunk_end], (ProxyActor pid=110747) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context (ProxyActor pid=110747) return func(args, kwargs) (ProxyActor pid=110747) File "/home/omegaleitatadmin/exllamav2/exllamav2/model.py", line 776, in _forward (ProxyActor pid=110747) x = module.forward(x, cache = cache, attn_params = attn_params, past_len = past_len, loras = loras, kwargs) (ProxyActor pid=110747) File "/home/omegaleitatadmin/exllamav2/exllamav2/embedding.py", line 134, in forward (ProxyActor pid=110747) hidden_states = self.embedding.forward(hidden_states) (ProxyActor pid=110747) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 162, in forward (ProxyActor pid=110747) return F.embedding( (ProxyActor pid=110747) File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2233, in embedding (ProxyActor pid=110747) return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) (ProxyActor pid=110747) IndexError: index out of range in self

turboderp commented 1 month ago

Can you elaborate on what model is doing this? I've tested with https://huggingface.co/blockblockblock/dolphin-2.9-mixtral-8x22b-bpw3-exl2 and I'm not seeing this issue. It often happens with merges not when the merged model isn't compiled properly, e.g. taking the config from one model (referencing added tokens) and the tokenizer from another (which doesn't have those added tokens.)