lyogavin / airllm

AirLLM 70B inference with single 4GB GPU
Apache License 2.0
4.6k stars 366 forks source link

mac m2 run air llm garage-bAInd/Platypus2-7B get error Input must be a file-like object opened in binary mode, or string #116

Open wuxiongwei opened 8 months ago

wuxiongwei commented 8 months ago

ValueError Traceback (most recent call last) Cell In[23], line 2 1 import mlx.core as mx ----> 2 generation_output = model.generate( 3 mx.array(input_tokens['input_ids']), 4 max_new_tokens=3, 5 use_cache=True, 6 return_dict_in_generate=True) 8 print(generation_output)

File ~/opt/anaconda3/envs/air_llm_python_3_8/lib/python3.8/site-packages/airllm/airllm_llama_mlx.py:254, in AirLLMLlamaMlx.generate(self, x, temperature, max_new_tokens, kwargs) 252 def generate(self, x, temperature=0, max_new_tokens=None, kwargs): 253 tokens = [] --> 254 for token in self.model_generate(x, temperature=temperature): 255 tokens.append(token) 258 if len(tokens) >= max_new_tokens:

File ~/opt/anaconda3/envs/air_llm_python_3_8/lib/python3.8/site-packages/airllm/airllm_llama_mlx.py:281, in AirLLMLlamaMlx.model_generate(self, x, temperature, max_new_tokens) 278 mask = mask.astype(self.tok_embeddings.weight.dtype) 280 self.record_memory('before_loading_tok') --> 281 update_weights = ModelPersister.get_model_persister().load_model(self.layer_names_dict['embed'], self.checkpoint_path) 283 self.record_memory('after_loading_tok') 284 self.tok_embeddings.update(update_weights['tok_embeddings']) ... 97 #available = psutil.virtual_memory().available / 1024 / 1024 98 #print(f"loaded {layer_name}, available mem: {available:.02f}") 100 layer_state_dict = map_torch_to_mlx(layer_state_dict)

ValueError: [load] Input must be a file-like object opened in binary mode, or string

Verdagon commented 7 months ago

I just had this same problem, and I think I have a fix. In mlx_model_persister.py, change: mx.load(to_load_path) to: mx.load(str(to_load_path))

@wuxiongwei, can you help verify the fix works for you?

8-momo-8 commented 7 months ago

@Verdagon , yes it works with converting it to string

mx.load(str(to_load_path))

mustangs0786 commented 5 months ago

Help where i have to put above code.....

input_text = [

'What is the capital of United States?',

    'I like',
]

MAX_LENGTH = 128 input_tokens = model.tokenizer(input_text, return_tensors="np", return_attention_mask=False, truncation=True, max_length=MAX_LENGTH, padding=False)

input_tokens

generation_output = model.generate( mx.array(input_tokens['input_ids']), max_new_tokens=3, use_cache=True, return_dict_in_generate=True)

print(generation_output)

shahfasal commented 5 months ago

@mustangs0786 same error for me where you able to figure it out?

shahfasal commented 5 months ago

@Verdagon where do i find this file mlx_model_persister.py?

shahfasal commented 5 months ago

Got it its under airllm/persist/