Open ElliottDyson opened 3 months ago
See the Save and load
sub-section under the Code Examples section
My apologies. Thank you!
I seem to be having issues with the save function for the pytorch model, and was hoping you might know why, it did correctly save a huggingface safetensors model though.
2024-04-10 14:33:21,912 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [18:09<00:00, 363.25s/it]
2024-04-10 14:58:28,819 - INFO - Converting the current model to sym_int4 format......
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\torch\serialization.py", line 619, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol, _disable_byteorder_record)
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\torch\serialization.py", line 853, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
RuntimeError: [enforce fail at inline_container.cc:593] . PytorchStreamWriter failed writing file data/0: file write failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\ellio\miniconda3\envs\llm\Scripts\uvicorn.exe\__main__.py", line 7, in <module>
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\click\core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\uvicorn\main.py", line 409, in main
run(
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\uvicorn\main.py", line 575, in run
server.run()
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\uvicorn\server.py", line 65, in run
return asyncio.run(self.serve(sockets=sockets))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ellio\miniconda3\envs\llm\Lib\asyncio\runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "C:\Users\ellio\miniconda3\envs\llm\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ellio\miniconda3\envs\llm\Lib\asyncio\base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\uvicorn\server.py", line 69, in serve
await self._serve(sockets)
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\uvicorn\server.py", line 76, in _serve
config.load()
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\uvicorn\config.py", line 433, in load
self.loaded_app = import_from_string(self.app)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\uvicorn\importer.py", line 19, in import_from_string
module = importlib.import_module(module_str)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ellio\miniconda3\envs\llm\Lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "D:\LLMs\API.py", line 74, in <module>
function_model.save_low_bit("FunctionCallingModel/low_bit")
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\ipex_llm\transformers\model.py", line 71, in save_low_bit
self.save_pretrained(*args, **kwargs)
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\transformers\modeling_utils.py", line 2114, in save_pretrained
save_function(shard, os.path.join(save_directory, shard_file))
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\torch\serialization.py", line 618, in save
with _open_zipfile_writer(f) as opened_zipfile:
File "C:\Users\ellio\miniconda3\envs\llm\Lib\site-packages\torch\serialization.py", line 466, in __exit__
self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:424] . unexpected pos 45056 vs 44950
Would you mind providing more regarding your test code, env, etc. so we could reproduce this issue? :)
Would you mind providing more regarding your test code, env, etc. so we could reproduce this issue? :)
Thank you, and sorry for the delay, we must be operating in fairly different timezones. I'm operating on the latest version of Windows 11, have the latest drivers and oneAPI installation, using transformers==4.34.0.
This is the relevant section of code that I am using, roleplay_model saves just fine, as you can see with the error message it is function_model that fails (the pytorch one):
from fastapi import FastAPI, Request, UploadFile, File, Body
import torch
from ipex_llm.transformers import AutoModelForCausalLM, AutoModelForSpeechSeq2Seq
#from transformers import AutoTokenizer, WhisperProcessor, TextGenerationPipeline
from ipex_llm import optimize_model
from ipex_llm.optimize import low_memory_init, load_low_bit
from transformers import AutoTokenizer, WhisperProcessor
from pydantic import BaseModel
app = FastAPI()
do_save_low_bit = True
roleplay_model_path = "text-generation-webui/models/Nexusflow_Starling-LM-7B-beta"
function_model_path = "FunctionCallingModel"
whisper_model_path = "whisper-med-en"
if do_save_low_bit == True:
# Load Roleplay LLM model to save it (transformers)
roleplay_model = AutoModelForCausalLM.from_pretrained(roleplay_model_path, load_in_4bit=True, trust_remote_code=True)
roleplay_model.save_low_bit(roleplay_model_path + "/low_bit")
# Load Function calling LLM model to save it (pytorch)
function_model = AutoModelForCausalLM.from_pretrained(function_model_path, trust_remote_code=True)
function_model = optimize_model(function_model, low_bit='sym_int4')
function_model.save_low_bit(function_model_path + "/low_bit")
# Load Whisper model to save it (transformers)
whisper_model = AutoModelForSpeechSeq2Seq.from_pretrained(whisper_model_path, load_in_4bit=True, trust_remote_code=True)
whisper_model.save_low_bit(whisper_model_path + "/low_bit")
else:
# Load Roleplay LLM model (transformers)
roleplay_model = AutoModelForCausalLM.load_low_bit(roleplay_model_path + "/low_bit", trust_remote_code=True)
roleplay_tokenizer = AutoTokenizer.from_pretrained(roleplay_model_path, trust_remote_code=True)
# Load Function calling LLM model (pytorch)
with low_memory_init():
function_model = AutoModelForCausalLM.from_pretrained(function_model_path + "/low_bit", torch_dtype="auto", trust_remote_code=True)
function_model = load_low_bit(function_model, function_model_path + "/low_bit")
function_tokenizer = AutoTokenizer.from_pretrained(function_model_path, trust_remote_code=True)
# Load Whisper model (transformers)
whisper_model = AutoModelForSpeechSeq2Seq.load_low_bit(whisper_model_path + "/low_bit", trust_remote_code=True)
whisper_model.config.forced_decoder_ids = None
whisper_processor = WhisperProcessor.from_pretrained(whisper_model_path)
forced_decoder_ids = whisper_processor.get_decoder_prompt_ids(language='english', task="transcribe")
# Move models to GPU
roleplay_model = roleplay_model.to('xpu')
function_model = function_model.to('xpu')
whisper_model = whisper_model.to('xpu')
# Complete an inference on each model to initialize the model
roleplay_model.generate(roleplay_model.dummy_inputs, max_new_tokens=1)
function_model.generate(function_model.dummy_inputs, max_new_tokens=1)
whisper_model.generate(whisper_model.dummy_inputs, max_new_tokens=1)
Hi @ElliottDyson,
Could you first verifying that there is sufficient space available on your disk to save the model, and checking whether your model file has been corrupted by another program when saving the model?
If those steps don't resolve the issue, would you mind providing us with your FunctionCallingModel
(e.g. huggingface repo id of this model), and run the environment checking scripts to share the results? This will help us better understand and reproduce the exact error.
P.S. If FunctionCallingModel
can be loaded with ipex_llm.transformers.AutoModelForCausalLM
, it could also be loaded and saved with:
function_model = AutoModelForCausalLM.from_pretrained(function_model_path, load_in_4bit=True, trust_remote_code=True)
function_model.save_low_bit(function_model_path + "/low_bit")
Dear Ipex Team,
I was wondering if there was a way of saving a model that has been optimised and quantised in its new state for future loading for HF/Pytorch models. I noticed there was a method in ipex_llm.optimize but this seems to be for a deprecated model loading method as it mentions a missing bigdl-llm config file. The reason for such a feature is slow loading when having to optimise and quantise upon loading each time, but this is exacerbated when loading a large model that ends up having to utilise paging space due to over-usage of RAM during this process, if this was one-off then it wouldn't be so bad, but for the moment this is the case. It would also enable the sharing of these IPEX-optimised models for other users of Intel-ARC GPUs.
Many thanks.