turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.18k stars 233 forks source link

quantization fails while writing shards #472

Closed theyunt closed 1 month ago

theyunt commented 1 month ago

i was trying to quant command-r-plus and process failed at last step.


 -- Compiling output file...
 -- Writing shard 1...
Traceback (most recent call last):
  File "D:\exllamav2\convert.py", line 287, in <module>
    compile_model(job, save_job, model)
  File "D:\exllamav2\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\exllamav2\conversion\compile.py", line 162, in compile_model
    save_file(save_dict, out_filename)
  File "D:\exllamav2\venv\lib\site-packages\safetensors\torch.py", line 282, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "D:\exllamav2\venv\lib\site-packages\safetensors\torch.py", line 459, in _flatten
    return {
  File "D:\exllamav2\venv\lib\site-packages\safetensors\torch.py", line 463, in <dictcomp>
    "data": _tobytes(v, k),
  File "D:\exllamav2\venv\lib\site-packages\safetensors\torch.py", line 420, in _tobytes
    data = np.ctypeslib.as_array(newptr, (total_bytes,))  # no internal copy
  File "D:\exllamav2\venv\lib\site-packages\numpy\ctypeslib.py", line 521, in as_array
    p_arr_type = ctypes.POINTER(_ctype_ndarray(obj._type_, shape))
  File "D:\exllamav2\venv\lib\site-packages\numpy\ctypeslib.py", line 354, in _ctype_ndarray
    element_type = dim * element_type
ValueError: Array length must be >= 0, not -2298478592
turboderp commented 1 month ago

It's a problem with either safetensors or numpy on Windows. I reported it here a while ago, but I don't really know how to proceed. Maybe I should poke them again.

theyunt commented 1 month ago

changing

length = int(np.prod(tensor.shape).item())

in safetensors torch.py to

length = int(np.prod(tensor.shape, dtype=np.uint64).item())

fixes it