Open wuxiongwei opened 9 months ago
I just had this same problem, and I think I have a fix. In mlx_model_persister.py
, change:
mx.load(to_load_path)
to:
mx.load(str(to_load_path))
@wuxiongwei, can you help verify the fix works for you?
@Verdagon , yes it works with converting it to string
mx.load(str(to_load_path))
input_text = [
'I like',
]
MAX_LENGTH = 128 input_tokens = model.tokenizer(input_text, return_tensors="np", return_attention_mask=False, truncation=True, max_length=MAX_LENGTH, padding=False)
input_tokens
generation_output = model.generate( mx.array(input_tokens['input_ids']), max_new_tokens=3, use_cache=True, return_dict_in_generate=True)
print(generation_output)
@mustangs0786 same error for me where you able to figure it out?
@Verdagon where do i find this file mlx_model_persister.py?
Got it its under airllm/persist/
ValueError Traceback (most recent call last) Cell In[23], line 2 1 import mlx.core as mx ----> 2 generation_output = model.generate( 3 mx.array(input_tokens['input_ids']), 4 max_new_tokens=3, 5 use_cache=True, 6 return_dict_in_generate=True) 8 print(generation_output)
File ~/opt/anaconda3/envs/air_llm_python_3_8/lib/python3.8/site-packages/airllm/airllm_llama_mlx.py:254, in AirLLMLlamaMlx.generate(self, x, temperature, max_new_tokens, kwargs) 252 def generate(self, x, temperature=0, max_new_tokens=None, kwargs): 253 tokens = [] --> 254 for token in self.model_generate(x, temperature=temperature): 255 tokens.append(token) 258 if len(tokens) >= max_new_tokens:
File ~/opt/anaconda3/envs/air_llm_python_3_8/lib/python3.8/site-packages/airllm/airllm_llama_mlx.py:281, in AirLLMLlamaMlx.model_generate(self, x, temperature, max_new_tokens) 278 mask = mask.astype(self.tok_embeddings.weight.dtype) 280 self.record_memory('before_loading_tok') --> 281 update_weights = ModelPersister.get_model_persister().load_model(self.layer_names_dict['embed'], self.checkpoint_path) 283 self.record_memory('after_loading_tok') 284 self.tok_embeddings.update(update_weights['tok_embeddings']) ... 97 #available = psutil.virtual_memory().available / 1024 / 1024 98 #print(f"loaded {layer_name}, available mem: {available:.02f}") 100 layer_state_dict = map_torch_to_mlx(layer_state_dict)
ValueError: [load] Input must be a file-like object opened in binary mode, or string