lyogavin / airllm

AirLLM 70B inference with single 4GB GPU
Apache License 2.0
4.05k stars 335 forks source link

segmentation fault python3 airllm2.py #129

Open taozhiyuai opened 4 months ago

taozhiyuai commented 4 months ago

Mac, native conda, mlx installed.

`(native) taozhiyu@603e5f4a42f1 downloads % pip show mlx airllm Name: mlx Version: 0.11.1 Summary: A framework for machine learning on Apple silicon. Home-page: https://github.com/ml-explore/mlx Author: MLX Contributors Author-email: mlx@group.apple.com License: Location: /Users/taozhiyu/miniconda3/envs/native/lib/python3.12/site-packages Requires: Required-by:

Name: airllm Version: 2.8.3 Summary: AirLLM allows single 4GB GPU card to run 70B large language models without quantization, distillation or pruning. Home-page: https://github.com/lyogavin/Anima/tree/main/air_llm Author: Gavin Li Author-email: gavinli@animaai.cloud License: Location: /Users/taozhiyu/miniconda3/envs/native/lib/python3.12/site-packages Requires: accelerate, huggingface-hub, optimum, safetensors, scipy, torch, tqdm, transformers Required-by: (native) taozhiyu@603e5f4a42f1 downloads % python3 airllm2.py found index file... found_layers:{'model.embed_tokens.': False, 'model.layers.0.': False, 'model.layers.1.': False, 'model.layers.2.': False, 'model.layers.3.': False, 'model.layers.4.': False, 'model.layers.5.': False, 'model.layers.6.': False, 'model.layers.7.': False, 'model.layers.8.': False, 'model.layers.9.': False, 'model.layers.10.': False, 'model.layers.11.': False, 'model.layers.12.': False, 'model.layers.13.': False, 'model.layers.14.': False, 'model.layers.15.': False, 'model.layers.16.': False, 'model.layers.17.': False, 'model.layers.18.': False, 'model.layers.19.': False, 'model.layers.20.': False, 'model.layers.21.': False, 'model.layers.22.': False, 'model.layers.23.': False, 'model.layers.24.': False, 'model.layers.25.': False, 'model.layers.26.': False, 'model.layers.27.': False, 'model.layers.28.': False, 'model.layers.29.': False, 'model.layers.30.': False, 'model.layers.31.': False, 'model.layers.32.': False, 'model.layers.33.': False, 'model.layers.34.': False, 'model.layers.35.': False, 'model.layers.36.': False, 'model.layers.37.': False, 'model.layers.38.': False, 'model.layers.39.': False, 'model.layers.40.': False, 'model.layers.41.': False, 'model.layers.42.': False, 'model.layers.43.': False, 'model.layers.44.': False, 'model.layers.45.': False, 'model.layers.46.': False, 'model.layers.47.': False, 'model.layers.48.': False, 'model.layers.49.': False, 'model.layers.50.': False, 'model.layers.51.': False, 'model.layers.52.': False, 'model.layers.53.': False, 'model.layers.54.': False, 'model.layers.55.': False, 'model.layers.56.': False, 'model.layers.57.': False, 'model.layers.58.': False, 'model.layers.59.': False, 'model.layers.60.': False, 'model.layers.61.': False, 'model.layers.62.': False, 'model.layers.63.': False, 'model.layers.64.': False, 'model.layers.65.': False, 'model.layers.66.': False, 'model.layers.67.': False, 'model.layers.68.': False, 'model.layers.69.': False, 'model.layers.70.': False, 'model.layers.71.': False, 'model.layers.72.': False, 'model.layers.73.': False, 'model.layers.74.': False, 'model.layers.75.': False, 'model.layers.76.': False, 'model.layers.77.': False, 'model.layers.78.': False, 'model.layers.79.': False, 'model.norm.': False, 'lm_head.': False} some layer splits found, some are not, re-save all layers in case there's some corruptions. 0%| | 0/83 [00:00<?, ?it/s]Loading shard 1/30 zsh: segmentation fault python3 airllm2.py (native) taozhiyu@603e5f4a42f1 downloads % /Users/taozhiyu/miniconda3/envs/native/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' (native) taozhiyu@603e5f4a42f1 downloads %

`

`from airllm import AutoModel

MAX_LENGTH = 128

could use hugging face model repo id:

model = AutoModel.from_pretrained("garage-bAInd/Platypus2-70B-instruct")

or use model's local path...

model = AutoModel.from_pretrained("/Users/taozhiyu/Downloads/Meta-Llama-3-70B-Instruct")

input_text = [ 'What is the capital of United States?', ]

input_tokens = model.tokenizer(input_text, return_tensors="pt", return_attention_mask=False, truncation=True, max_length=MAX_LENGTH, padding=False)

generation_output = model.generate( input_tokens['input_ids'].cuda(), max_new_tokens=20, use_cache=True, return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])

print(output) `

Proryanator commented 4 months ago

What macbook are you using? An M3 Max? 🤔 I've seen an issue like this here as well as in the coreml stable diffusion repo specific to M3 macbooks.

taozhiyuai commented 4 months ago

What macbook are you using? An M3 Max? 🤔 I've seen an issue like this here as well as in the coreml stable diffusion repo specific to M3 macbooks.

m3 max 128gb

Proryanator commented 4 months ago

My suspicion yeah, very odd. I have a 36GB M3 Max.