Open khmyznikov opened 1 month ago
@shaahji
@khmyznikov I am unable to reproduce the issue with current head in Olive repo.
After some of the changes ....
You mentioned something about making changes. Did you mean Olive dev making changes or did you make these changes. If the changes are local to you, could you share what you did?
Here's the output from my local run from tip in Olive repo.
(olive) <local>\Olive\examples\phi3>python phi3.py --target cpu --precision int4 --inference --prompt "Write a story starting with once upon a time" --max_length 200
Generating Olive configuration file...
Olive configuration file is generated...
Generating optimized model for cpu ...
[2024-10-01 10:58:03,026] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2024-10-01 10:58:03,093] [INFO] [cache.py:51:__init__] Using cache directory: <local>\Olive\examples\phi3\cache\default_workflow
[2024-10-01 10:58:03,093] [INFO] [engine.py:975:save_olive_config] Saved Olive config to <local>\Olive\examples\phi3\cache\default_workflow\olive_config.json
[2024-10-01 10:58:03,109] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2024-10-01 10:58:03,109] [INFO] [engine.py:274:run] Running Olive on accelerator: cpu-cpu
[2024-10-01 10:58:03,109] [INFO] [engine.py:1068:_create_system] Creating target system ...
[2024-10-01 10:58:03,109] [INFO] [engine.py:1071:_create_system] Target system created in 0.000000 seconds
[2024-10-01 10:58:03,109] [INFO] [engine.py:1080:_create_system] Creating host system ...
[2024-10-01 10:58:03,109] [INFO] [engine.py:1083:_create_system] Host system created in 0.000000 seconds
[2024-10-01 10:58:03,141] [INFO] [engine.py:840:_run_pass] Running pass builder:ModelBuilder {}
<local>\.conda\envs\olive\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
<local>\.conda\envs\olive\lib\site-packages\transformers\models\auto\configuration_auto.py:913: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
warnings.warn(
<local>\.conda\envs\olive\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
GroupQueryAttention (GQA) is used in this model.
<local>\.conda\envs\olive\lib\site-packages\transformers\models\auto\auto_factory.py:468: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
warnings.warn(
modeling_phi3.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 73.2k/73.2k [00:00<00:00, 4.81MB/s]
A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
2024-10-01 10:58:09,512 transformers_modules.microsoft.Phi-3-mini-4k-instruct.0a67737cc96d2554230f90338b163bc6380a2a85.modeling_phi3 [WARNING] - `flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
2024-10-01 10:58:09,512 transformers_modules.microsoft.Phi-3-mini-4k-instruct.0a67737cc96d2554230f90338b163bc6380a2a85.modeling_phi3 [WARNING] - Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
<local>\.conda\envs\olive\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
model.safetensors.index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16.5k/16.5k [00:00<00:00, 2.10MB/s]
model-00001-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.97G/4.97G [14:55<00:00, 5.55MB/s]
model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.67G/2.67G [07:18<00:00, 6.09MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [22:14<00:00, 667.13s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.04s/it]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 181/181 [00:00<?, ?B/s]
Reading embedding layer
Reading decoder layer 0
Reading decoder layer 1
Reading decoder layer 2
Reading decoder layer 3
Reading decoder layer 4
Reading decoder layer 5
Reading decoder layer 6
Reading decoder layer 7
Reading decoder layer 8
Reading decoder layer 9
Reading decoder layer 10
Reading decoder layer 11
Reading decoder layer 12
Reading decoder layer 13
Reading decoder layer 14
Reading decoder layer 15
Reading decoder layer 16
Reading decoder layer 17
Reading decoder layer 18
Reading decoder layer 19
Reading decoder layer 20
Reading decoder layer 21
Reading decoder layer 22
Reading decoder layer 23
Reading decoder layer 24
Reading decoder layer 25
Reading decoder layer 26
Reading decoder layer 27
Reading decoder layer 28
Reading decoder layer 29
Reading decoder layer 30
Reading decoder layer 31
Reading final norm
Reading LM head
Saving ONNX model in <local>\Olive\examples\phi3\cache\default_workflow\models\1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu\output_model
<local>\.conda\envs\olive\lib\site-packages\transformers\generation\configuration_utils.py:814: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
warnings.warn(
Saving GenAI config in <local>\Olive\examples\phi3\cache\default_workflow\models\1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu\output_model
<local>\.conda\envs\olive\lib\site-packages\transformers\models\auto\tokenization_auto.py:757: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
warnings.warn(
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.44k/3.44k [00:00<?, ?B/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 6.28MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.94M/1.94M [00:00<00:00, 6.22MB/s]
added_tokens.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 306/306 [00:00<00:00, 19.1kB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 599/599 [00:00<00:00, 92.1kB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Saving processing files in <local>\Olive\examples\phi3\cache\default_workflow\models\1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu\output_model for GenAI
[2024-10-01 11:23:23,189] [INFO] [engine.py:943:_run_pass] Pass builder:ModelBuilder finished in 1520.048102 seconds
[2024-10-01 11:23:25,341] [INFO] [engine.py:457:run_no_search] Saved output model to <local>\AppData\Local\Temp\tmpft22faik\output_model
[2024-10-01 11:23:25,346] [INFO] [engine.py:367:run_accelerator] Save footprint to <local>\AppData\Local\Temp\tmpft22faik\footprints.json.
[2024-10-01 11:23:25,350] [INFO] [engine.py:294:run] Run history for cpu-cpu:
[2024-10-01 11:23:25,362] [INFO] [engine.py:550:dump_run_history] run history:
+------------------------------------------+-------------------+--------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+==========================================+===================+==============+================+===========+
| 3874e362 | | | | |
+------------------------------------------+-------------------+--------------+----------------+-----------+
| 1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu | 3874e362 | ModelBuilder | 1520.05 | |
+------------------------------------------+-------------------+--------------+----------------+-----------+
Command succeeded. Output model saved to models\phi3
Model inference starts...
Loading model...
Model loaded in 3.16 seconds
Creating tokenizer...
Creating generator ...
Generator created
<|user|>
Write a story starting with once upon a time<|end|>
<|assistant|>
Once upon a time, in a small village nestled between rolling hills and lush green meadows, there lived a young girl named Lily. She was known for her kind heart and her insatiable curiosity about the world around her. Lily'ran as the sun rose, she would eagerly set out on her daily adventures, exploring the woods, fields, and streams that surrounded her village.
One day, as she was wandering through the woods, Lily stumbled upon an old, moss-covered stone with strange symbols etched into its surface. Intrigued, she decided to take the stone home and study it. As she examined the symbols, she noticed that they seemed to form a pattern, like a map.
Determined to uncover the mystery of the stone, Lily spent days and nights studying the symbols, drawing
Prompt tokens: 16, New tokens: 184, Time to first: 0.40s, New tokens per second: 15.01 tps
Describe the bug After some of the changes, phi3 sample with inference flag stopped to work
Olive logs
Other information