lyogavin / airllm

AirLLM 70B inference with single 4GB GPU
Apache License 2.0
5.28k stars 423 forks source link

mlx embedding indexing failure - ValueError: Cannot index mlx array using the given type. #167

Closed shiwanlin closed 3 months ago

shiwanlin commented 3 months ago

Has anyone runs into this trying to run llama 3.1 405B, following the instructions in this repo? It happens apparently after the model was loaded fine.

Traceback (most recent call last):
  File "/Volumes/AI Models/airllm/run.py", line 26, in <module>
    generation_output = model.generate(
                        ^^^^^^^^^^^^^^^
  File "/Users/shiwanlin/opt/anaconda3/envs/native/lib/python3.12/site-packages/airllm/airllm_llama_mlx.py", line 254, in generate
    for token in self.model_generate(x, temperature=temperature):
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shiwanlin/opt/anaconda3/envs/native/lib/python3.12/site-packages/airllm/airllm_llama_mlx.py", line 289, in model_generate
    x = self.tok_embeddings(x)
        ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shiwanlin/opt/anaconda3/envs/native/lib/python3.12/site-packages/mlx/nn/layers/embedding.py", line 33, in __call__
    return self.weight[x]
           ~~~~~~~~~~~^^^
ValueError: Cannot index mlx array using the given type.

====================

have to fix this error earlier:

alueError: [load] Input must be a file-like object opened in binary mode, or string

by adding str() to the following line in "airllm/persist/mlx_model_persister.py" layer_state_dict = mx.load(str(to_load_path))

==================== This is pristine Apple native Conda env on a Mac M1 Pro:

          conda version : 24.5.0
    conda-build version : 24.5.1
         **python version : 3.12.4.final.0**
                 solver : libmamba (default)
       virtual packages : __archspec=1=m1
                          __conda=24.5.0=0
                          __osx=14.5=0
                          __unix=0=0
               platform : osx-arm64
             user-agent : conda/24.5.0 requests/2.32.2 CPython/3.12.4 Darwin/23.5.0 OSX/14.5 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8 aau/0.4.4 c/. s/. e/.

tried mlx 16.1, 15.2, and 14.1, all with the same error.

airllm version: 2.9.1 for py3

====================

The input into self.weight[x] is a tensor type, which corresponds to the input tokens:

type(x)= <class 'torch.Tensor'>
x= tensor([[128000,  12840,    596,    279,   6864,    315,    279,    549,    815, 82508]])

I guess it's an API mismatching or the sort?

Any help is appreciated.

shiwanlin commented 3 months ago

found the problem:

For running OSX using mlx, this change from the testing codes in the read.me is needed:

import mlx.core as mx

generation_output = model.generate(
#  input_tokens['input_ids'].cuda(),   ---> 
    mx.array(input_tokens['input_ids']), 
shiwanlin commented 3 months ago

closed.