Closed bablat closed 3 weeks ago
Is this on the latest version? From the stack trace it appears to be fairly old.
I may have messed up with a few different venvs, but I believe this is with .21:
-- Beginning new job
!! Warning: Output directory is not empty: tmpdq
!! Cleaning output directory: tmpdq
-- Input: cognitivecomputations_dolphin-2.9.1-qwen-110b
-- Output: tmpdq
-- Using default calibration dataset
-- Target bits per weight: 5.6 (decoder), 8 (head)
-- Max shard size: 8192 MB
-- Full model will be compiled to: cognitivecomputations_dolphin-2.9.1-qwen-110b-5.6bpw-exl2/
-- Tokenizing samples (measurement)...
Traceback (most recent call last):
File "/storage/textgen/models/../exllamav2/convert.py", line 221, in <module>
tokenize(job, save_job, tokenizer, measure = True)
File "/storage/textgen/exllamav2/conversion/tokenize.py", line 47, in tokenize
cal_tokens = get_standard_calibration(job, measure, tokenizer)
File "/storage/textgen/exllamav2/conversion/tokenize.py", line 96, in get_standard_calibration
tokenized_articles = [tokenizer.encode(a, add_bos = True, add_eos = True) for a in articles]
File "/storage/textgen/exllamav2/conversion/tokenize.py", line 96, in <listcomp>
tokenized_articles = [tokenizer.encode(a, add_bos = True, add_eos = True) for a in articles]
File "/storage/textgen/exllamav2/exllamav2/tokenizer/tokenizer.py", line 418, in encode
ids = torch.tensor(ids).to(torch.long).unsqueeze(0)
RuntimeError: Could not infer dtype of NoneType
I got same issue with 0.0.21, for 0.0.20 it report hessian error
So I looked into it, and the issue is that from 0.0.21 ExLlama no longer uses token ID 0 as a fallback when the model doesn't define a BOS token. Simple fix is to add "bos_token_id": 151644,
to config.json, and it should start fine.
The Hessian error I'm assuming is because it runs out of memory. Qwen-110B is simply too big to quantize on a 24 GB GPU.
So I looked into it, and the issue is that from 0.0.21 ExLlama no longer uses token ID 0 as a fallback when the model doesn't define a BOS token. Simple fix is to add
"bos_token_id": 151644,
to config.json, and it should start fine.The Hessian error I'm assuming is because it runs out of memory. Qwen-110B is simply too big to quantize on a 24 GB GPU.
by add "bos_token_id": 151644 to config.json, RuntimeError: Could not infer dtype of NoneType is fixed and hessian error, i got this on 4x3090 So it needs more than 24G on a single card? I can quantize cmdr+, and it can transfer to a second card if it runs low on video memory while quantizing, but dolphin-qwen give hessian error
---------------------------------------------
| Measured: model.layers.0 (Attention) |
| Duration: 103.03 seconds |
| Completed step: 1/163 |
| Avg time / step (rolling): 103.03 seconds |
| Estimated remaining time: 278min 10sec |
| Last checkpoint layer: None |
---------------------------------------------
-- Layer: model.layers.0 (MLP)
!! Warning: Applied additional damping
!! Warning: Applied additional damping
!! Warning: Applied additional damping
!! Warning: Applied additional damping
!! Warning: Applied additional damping
!! Warning: Applied additional damping
!! Warning: Applied additional damping
!! Warning: Applied additional damping
!! Warning: Applied additional damping
!! Warning: Applied additional damping
Traceback (most recent call last):
File "e:\exllamav2\conversion\adaptivegptq.py", line 292, in prepare
hessian_inv = torch.linalg.cholesky(hessian)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 45058 is not positive-definite).
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "e:\exllamav2\convert.py", line 240, in <module>
status = measure_quant(job, save_job, model) # capturing the graceful exits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\tabbyAPI\venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "e:\exllamav2\conversion\measure.py", line 563, in measure_quant
m = measure_mlp(module, hidden_states, target_states, quantizers, cache, attn_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "e:\exllamav2\conversion\measure.py", line 204, in measure_mlp
quantizers["down_proj"].prepare()
File "e:\exllamav2\conversion\adaptivegptq.py", line 330, in prepare
raise ValueError("Hessian is not invertible")
ValueError: Hessian is not invertible
Might not related to this but is there any function/setting i can use to set eos token to a list of tokens/ ids? Currently i use the stream method and manually validate the output token.
I just encountered the same issue with today's new Qwens, the bos_token_id fix @ config.json works, I'll close this issue. Thanks for your huge contribution to the community @turboderp.
stdout included:
Help would be appreciated.