turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.66k stars 214 forks source link

Error when using Beam Search #308

Open bibekyess opened 7 months ago

bibekyess commented 7 months ago

Hello! I am trying to use beam search while doing inference on my GPTQ quantized 4-bit Llama model whose base model is daekeun-ml/Llama-2-ko-instruct-13B. I got an error like this:

Model loaded: ['...']
Starting server on address 0.0.0.0:8004
{'beams': 3, 'beam_length': 3, 'in_beam_search': True}
ERROR:example_flask:Exception on /infer_bench [POST]
Traceback (most recent call last):
  File "/home/bibekyess/anaconda3/envs/exllama-env/lib/python3.11/site-packages/flask/app.py", line 1455, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/anaconda3/envs/exllama-env/lib/python3.11/site-packages/flask/app.py", line 869, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/anaconda3/envs/exllama-env/lib/python3.11/site-packages/flask/app.py", line 867, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/anaconda3/envs/exllama-env/lib/python3.11/site-packages/flask/app.py", line 852, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/exllama_sandbox/exllama/example_flask.py", line 97, in inferContextB
    outputs = generator.generate_simple(prompt, max_new_tokens = 400)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/exllama_sandbox/exllama/generator.py", line 313, in generate_simple
    self.end_beam_search()
  File "/home/bibekyess/exllama_sandbox/exllama/generator.py", line 698, in end_beam_search
    self.sequence = self.sequence_actual.clone()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'clone'

Has anyone faced the similar error before? To reproduce I think you can set this in the generator settings:

    generator.settings.beams = 3
    generator.settings.beam_length = 3
    generator.in_beam_search = True

Thank you for your help! :)