turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 236 forks source link

`Some tensors share memory` error with convert.py #320

Closed brucethemoose closed 5 months ago

brucethemoose commented 5 months ago

As the title says, using exllamav2 built from source just a few minutes ago.

python convert.py --in_dir /home/alpha/FastStorage/Models/Raw/abacusai_Smaug-34B-v0.1 -o /home/alpha/FastStorage/scratch -b 3.0 -hb 6 -cf /home/alpha/FastStorage/Models/exllama/smaug-3.0bpw -nr
No ROCm runtime is found, using ROCM_HOME='/opt/rocm'
 -- Beginning new job
 !! Warning: Output directory is not empty: /home/alpha/FastStorage/scratch
 !! Cleaning output directory: /home/alpha/FastStorage/scratch
 -- Input: /home/alpha/FastStorage/Models/Raw/kyujinpy_PlatYi-34B-200k-Q-FastChat
 -- Output: /home/alpha/FastStorage/scratch
 -- Using default calibration dataset
 -- Target bits per weight: 3.0 (decoder), 6 (head)
 -- Max shard size: 8192 MB
 -- Full model will be compiled to: /home/alpha/FastStorage/Models/exllama/fastchat-3.0bpw
 -- Tokenizing samples (measurement)...
 -- Token embeddings (measurement)...
Traceback (most recent call last):
  File "/home/alpha/AI/exllamav2/convert.py", line 213, in <module>
    embeddings(job, save_job, model)
  File "/home/alpha/AI/exllamav2/conversion/measure.py", line 82, in embeddings
    save_file(embeddings_dict, os.path.join(job["out_dir"], "hidden_states.safetensors"))
  File "/home/alpha/AI/venv/lib/python3.11/site-packages/safetensors/torch.py", line 232, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
                   ^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/venv/lib/python3.11/site-packages/safetensors/torch.py", line 394, in _flatten
    raise RuntimeError(
RuntimeError:
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'row.00010', 'row.00000', 'row.00003', 'row.00009', 'row.00007', 'row.00015', 'row.00006', 'row.00017', 'row.00008', 'row.00011', 'row.00014', 'row.00016', 'row.00018', 'row.00001', 'row.00002', 'row.00005', 'row.00004', 'row.00012', 'row.00013'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Happens to several other 34B models as well.

turboderp commented 5 months ago

Yeah, this is safetensors being pedantic for some reason. Not sure why it would only trigger sometimes, but try updating and see if the latest commit doesn't fix it.

brucethemoose commented 5 months ago

This was me being silly and using an old version of safetensors even though I thought I upgraded it.

turboderp commented 5 months ago

It's fine though. Better to explicitly separate those tensors anyway so the next version of safetensors doesn't try to do something smart with the slices. :+1: