microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
495 stars 123 forks source link

Unable to download Phi-3-Mini for CPU #872

Closed jeremyfowers closed 1 month ago

jeremyfowers commented 1 month ago

Describe the bug The tutorial on the front README of this repo says to run the following command:

huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir .

When I run that command on my PC I get this error:

(oga-igpu) PS C:\work\turnkeyml\src\turnkeyml\llm\tools\ort_genai\models> huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir .
Fetching 10 files:   0%|                            | 0/10 [00:00<?, ?it/s]Downloading 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/added_tokens.json' to '.cache\huggingface\download\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\added_tokens.json.4dece7ae8bbeb8f468cb1da428bfb6193ae0751c.incomplete'
Downloading 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/tokenizer.json' to '.cache\huggingface\download\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\tokenizer.json.efc309ef56b8d8fba1b50d1b4a6e5be6cfded459.incomplete'  
Downloading 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/config.json' to '.cache\huggingface\download\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\config.json.01de35834a12bca5fc9150cc2a8351135f442757.incomplete'        
Downloading 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json' to '.cache\huggingface\download\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\genai_config.json.5e5a61eea6e20bda5b011053a889535029e4b9c1.incomplete'
(…)n-block-32-acc-level-4/added_tokens.json: 100%|█| 293/293 [00:00<00:00, 
Download complete. Moving file to cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\added_tokens.json
(…)nt4-rtn-block-32-acc-level-4/config.json: 100%|█| 919/919 [00:00<?, ?B/s
Download complete. Moving file to cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\config.json2-acc-level-4/config.json:   0%| | 0.00/919 [00:00<?, ?B/ 
Downloading 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/tokenizer.model' to '.cache\huggingface\download\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\tokenizer.model.9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347.incomplete'
(…)n-block-32-acc-level-4/genai_config.json: 100%|█| 1.58k/1.58k [00:00<00:
Download complete. Moving file to cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\genai_config.json4/genai_config.json:   0%| | 0.00/1.58k [00:00<?, ? 
Downloading 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/tokenizer_config.json' to '.cache\huggingface\download\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\tokenizer_config.json.9d9d37222d0f5ad9b2f02408b13ec21b8023a93f.incomplete'
Downloading 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/special_tokens_map.json' to '.cache\huggingface\download\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\special_tokens_map.json.32b360b36e8255e8346f50942f478e5a2227e2e6.incomplete'
Downloading 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/configuration_phi3.py' to '.cache\huggingface\download\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\configuration_phi3.py.f4553db23ac65c608fd150a14acbd04d3ff80a0f.incomplete'
(…)-rtn-block-32-acc-level-4/tokenizer.json: 100%|█| 1.84M/1.84M [00:00<00:

Download complete. Moving file to cpu_and_mobile\cpu-int4-rtn-block-32-acc-l(…)ock-32-acc-level-4/tokenizer_config.json: 100%|█| 3.17k/3.17k [00:00<00:
(…)k-32-acc-level-4/special_tokens_map.json: 100%|█| 568/568 [00:00<00:00,  
(…)k-32-acc-level-4/special_tokens_map.json:   0%| | 0.00/568 [00:00<?, ?B/ 
Download complete. Moving file to cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\tokenizer_config.json
tokenizer.model:   0%|                          | 0.00/500k [00:00<?, ?B/s]Download complete. Moving file to cpu_and_mobile\cpu-int4-rtn-block-32-acc-le(…)ock-32-acc-level-4/configuration_phi3.py: 100%|█| 10.4k/10.4k [00:00<?, 
Download complete. Moving file to cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\configuration_phi3.pyuration_phi3.py:   0%| | 0.00/10.4k [00:00<?, ? 
tokenizer.model: 100%|██████████████████| 500k/500k [00:00<00:00, 10.8MB/s] 
Download complete. Moving file to cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\tokenizer.model
Fetching 10 files:  40%|████████            | 4/10 [00:00<00:00, 13.24it/s] 
Traceback (most recent call last):
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\jfowe\.conda\envs\oga-igpu\Scripts\huggingface-cli.exe\__main__.py", line 7, in <module>
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\huggingface_hub\commands\huggingface_cli.py", line 52, in main
    service.run()
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\huggingface_hub\commands\download.py", line 146, in run
    print(self._download())  # Print path to downloaded files
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\huggingface_hub\commands\download.py", line 180, in _download
    return snapshot_download(
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\huggingface_hub\_snapshot_download.py", line 299, in snapshot_download
    thread_map(
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\tqdm\contrib\concurrent.py", line 69, in thread_map
    return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs) 
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\tqdm\contrib\concurrent.py", line 51, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\tqdm\std.py", line 1181, in __iter__
    for obj in iterable:
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\concurrent\futures\_base.py", line 609, in result_iterator
    yield fs.pop().result()
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\concurrent\futures\_base.py", line 446, in result
    return self.__get_result()
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\concurrent\futures\_base.py", line 391, in __get_result
    raise self._exception
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\huggingface_hub\_snapshot_download.py", line 273, in _inner_hf_hub_download
    return hf_hub_download(
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\huggingface_hub\utils\_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\huggingface_hub\file_download.py", line 1220, in hf_hub_download
    return _hf_hub_download_to_local_dir(
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\huggingface_hub\file_download.py", line 1515, in _hf_hub_download_to_local_dir
    _download_to_tmp_and_move(
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\site-packages\huggingface_hub\file_download.py", line 1903, in _download_to_tmp_and_move
    with incomplete_path.open("ab") as f:
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\pathlib.py", line 1252, in open
    return io.open(self, mode, buffering, encoding, errors, newline,        
  File "C:\Users\jfowe\.conda\envs\oga-igpu\lib\pathlib.py", line 1120, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '.cache\\huggingface\\download\\cpu_and_mobile\\cpu-int4-rtn-block-32-acc-level-4\\phi3-mini-4k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx.385cd1b908a0d2f8634e86d30236f6dbb7ae660eb3943fd1ef5bdc3847326480.incomplete'

Expected behavior I would expect the model to download. This works for me with DirectML.

Desktop (please complete the following information):

Additional context I'm filing this as an OGA bug since the OGA README is not working for me. There may be some bug on the huggingface side as well, but I'm just trying to get OGA for CPU working any way I can.

cc @kunal-vaishnavi - related to the email I just sent - I was curious to try the TurneyML-LLM MMLU test with CPU but hit this issue.

jeremyfowers commented 1 month ago

BTW I unblocked myself with:

git lfs install
git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx

and then I copied the resulting cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4 folder to the desired location.

baijumeswani commented 1 month ago

Seems related to this: https://github.com/huggingface/huggingface_hub/issues/2374

Could you check if addressing the windows max file path limit fixes the issue?

jeremyfowers commented 1 month ago

It does seem to have something to do with the path length. When I set C:\ as my local dir I get all the files.

However, I have the latest huggingface-hub==0.26.4 installed (which came out on 8/19) so I should already have the patch you linked? This will still be a problem for windows users of OGA since the path limit issue is not obvious.

natke commented 1 month ago

Thanks for reporting this @jeremyfowers. I will update the README