Open maria-18-git opened 2 months ago
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery downloaded,preprocessed,dataset_name=mixtral
...
"/usr/bin/wget" -O "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" https://inference.mlcommons-storage.org/mixtral_8x7b%2F2024.06.06_mixtral_15k_v4.pkl
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--2024-07-11 06:18:28-- https://inference.mlcommons-storage.org/mixtral_8x7b%2F2024.06.06_mixtral_15k_v4.pkl
Resolving inference.mlcommons-storage.org (inference.mlcommons-storage.org)... 172.67.167.47, 104.21.16.91, 2606:4700:3037::6815:105b, ...
Connecting to inference.mlcommons-storage.org (inference.mlcommons-storage.org)|172.67.167.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 71763360 (68M) [application/octet-stream]
Saving to: ‘/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl’
/local/mnt/workspace/mmirkina/work_collection/downlo 100%[=====================================================================================================================>] 68.44M 38.7MB/s in 1.8s
2024-07-11 06:18:30 (38.7 MB/s) - ‘/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl’ saved [71763360/71763360]
INFO:root:Matched Rule #1/2 produced an entry, which matches the original query.
['^', 'byname', 'downloaded_2024.06.06_mixtral_15k_v4.pkl']
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery downloaded,preprocessed,dataset_name=mixtral , get_path
/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ ls /local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/
2024.06.06_mixtral_15k_v4.pkl data_axs.json
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery extracted,checkpoint,model_name=mixtral
...
"/usr/bin/rclone" copy mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1 "/local/mnt/workspace/mmirkina/work_collection/downloaded_extracted_mixtral-8x7b-instruct-v0.1/extracted/mixtral-8x7b-instruct-v0.1" -P
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Transferred: 173.982 GiB / 173.982 GiB, 100%, 28.538 MiB/s, ETA 0s
Transferred: 42 / 42, 100%
Elapsed time: 34m21.2s
INFO:root:Matched Rule #1/2 produced an entry, which matches the original query.
['^', 'byname', 'downloaded_extracted_mixtral-8x7b-instruct-v0.1']
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byname model_mixtral_recipe , run
...
WARNING:root:shell.run() about to execute (with env=None, in_dir=None, capture_output=False, errorize_output=False, capture_stderr=False, split_to_lines=False):
"/usr/bin/rclone" copy mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1 "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" -P
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Transferred: 173.982 GiB / 173.982 GiB, 100%, 12.036 MiB/s, ETA 0s
Transferred: 42 / 42, 100%
Elapsed time: 35m37.2s
INFO:root:Matched Rule #1/2 produced an entry, which matches the original query.
WARNING:root:shell.run() about to execute (with env=None, in_dir=None, capture_output=False, errorize_output=False, capture_stderr=False, split_to_lines=False):
cp /local/mnt/workspace/mmirkina/work_collection/axs2mlperf/model_mixtral_recipe/tokenizer* /local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0
In checkpoint model dicrectory:
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ ls -la ../downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1/
total 182435448
drwxr-xr-x 2 mmirkina users 4096 Jul 12 13:13 .
drwxr-xr-x 3 mmirkina users 4096 Jul 12 12:38 ..
-rw-r--r-- 1 mmirkina users 803 Jun 24 17:04 config.json
-rw-r--r-- 1 mmirkina users 111 Jun 24 17:04 generation_config.json
-rw-r--r-- 1 mmirkina users 4920052720 Jun 24 17:04 model-00001-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00002-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00003-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00004-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00005-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504264 Jun 24 17:05 model-00006-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559912 Jun 24 17:05 model-00007-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00008-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00009-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00010-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00011-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646240 Jun 24 17:06 model-00012-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:06 model-00013-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00014-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00015-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00016-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00017-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00018-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:08 model-00019-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00020-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00021-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00022-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00023-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00024-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:09 model-00025-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00026-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00027-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00028-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00029-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00030-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:11 model-00031-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00032-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00033-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00034-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00035-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00036-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646264 Jun 24 17:12 model-00037-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:13 model-00038-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 1463862216 Jun 24 17:13 model-00039-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 92659 Jun 24 17:13 model.safetensors.index.json
-rw-r--r-- 1 mmirkina users 1466 Jul 12 13:13 tokenizer_config.json
-rw-r--r-- 1 mmirkina users 1795303 Jul 12 13:13 tokenizer.json
-rw-r--r-- 1 mmirkina users 493443 Jul 12 13:13 tokenizer.model
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery downloaded,checkpoint,model_name=mixtral --- , get_path
/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1
axs
)mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline,dataset_path=/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/download_dataset_15_samples/mixtral_15.pkl,total_sample_count=15,loadgen_dataset_size=15
...
"model_name": "mixtral_8x7b",
"mlperf_model_name": "mixtral_8x7b",
"model_path": "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/",
"dataset_name": "mixtral",
"sut_name": "aus655-apollo-0",
"program_name": "mixtral_reference_code",
"loadgen_buffer_size": 8,
"loadgen_compliance_test": null,
"device": "cuda",
"dtype": "float16"
} saved to '/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5/data_axs.json'
WARNING:root:shell.run() about to execute (with env={'PATH': '/usr2/mmirkina/.local/bin:/usr2/mmirkina/.local/bin:/usr2/mmirkina/.local/bin:/usr2/mmirkina/.local/bin:/usr2/mmirkina/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr2/mmirkina/bin:/usr2/mmirkina/bin:/usr2/mmirkina/bin:/usr2/mmirkina/bin:/local/mnt/workspace/mmirkina/axs:/usr2/mmirkina/bin:/local/mnt/workspace/mmirkina/axs:/local/mnt/workspace/mmirkina//bin:/snap/bin:/local/mnt/workspace/mmirkina/work_collection/numpy_1.24.1_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/pybind11_2.10.4_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/pandas_2.2.2_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/nltk_3.8.1_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/evaluate_0.4.0_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/absl-py_1.4.0_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/rouge-score_0.1.2_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/sentencepiece_0.1.99_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/accelerate_0.21.0_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/torch_2.3.1_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/tokenizers_0.19.1_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/mlperf_loadgen_package_for_python3.9/install/bin', 'HOME': '/local/mnt/workspace/mmirkina/', 'AXS_WORK_COLLECTION': '/local/mnt/workspace/mmirkina/work_collection', 'PYTHONPATH': '/local/mnt/workspace/mmirkina/work_collection/numpy_1.24.1_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/pybind11_2.10.4_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/pandas_2.2.2_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/nltk_3.8.1_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/evaluate_0.4.0_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/absl-py_1.4.0_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/rouge-score_0.1.2_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/sentencepiece_0.1.99_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/accelerate_0.21.0_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/torch_2.3.1_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/tokenizers_0.19.1_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/mlperf_loadgen_package_for_python3.9/install/lib/python3.9/site-packages'}, in_dir=/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5, capture_output=False, errorize_output=True, capture_stderr=False, split_to_lines=False):
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/" --accuracy --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/axs2mlperf/mixtral_reference_code/user.conf" --total-sample-count 15 --dataset-path "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/download_dataset_15_samples/mixtral_15.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5" --device "cuda" --dtype "float16"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
WARNING:Mixtral-8x7B-Instruct-v0.1-MAIN:Accuracy run will generate the accuracy logs, but the evaluation of the log is not completed yet
Loading dataset...
Finished loading dataset.
...
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 15 samples
IssueQuery done
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
warnings.warn(
Saving outputs to run_outputs/q9.pkl
Samples run: 1
BatchMaker time: 0.0008153915405273438
Inference time: 12.435491561889648
Postprocess time: 0.0005464553833007812
==== Total time: 12.436853408813477
...
Samples run: 15
BatchMaker time: 0.00018739700317382812
Inference time: 47.541648864746094
Postprocess time: 0.000560760498046875
==== Total time: 47.542397022247314
No warnings encountered during test.
No errors encountered during test.
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
...
"_parent_entries": [
[
"^",
"byname",
"base_mixtral_loadgen_experiment"
]
],
"with_power": null,
"model_name": "mixtral_8x7b",
"mlperf_model_name": "mixtral_8x7b",
"model_path": "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/",
"dataset_name": "mixtral",
"sut_name": "aus655-apollo-0",
"program_name": "mixtral_reference_code",
"loadgen_buffer_size": 8,
"loadgen_compliance_test": null,
"device": "cuda",
"dtype": "float16",
"experiment_end_timestamp": "2024.07.07T09:30:59"
} saved to '/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5/data_axs.json'
INFO:root:Matched Rule #1/1 produced an entry, which matches the original query.
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5']
real 4m27.111s
user 23m49.428s
sys 7m30.129s
Accuracy(without axs
):
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --checkpoint-path /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/ --mlperf-accuracy-file /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5/mlperf_log_accuracy.json --dataset-file /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/download_dataset_15_samples/mixtral_15.pkl --dtype int32
...
{'rouge1': 51.8093, 'rouge2': 23.1958, 'rougeL': 31.7219, 'rougeLsum': 48.2656, 'gsm8k': 80.0, 'mbxp': 20.0, 'gen_len': 4271, 'gen_num': 15, 'gen_tok_len': 4560, 'tokens_per_sample': 304.0}
axs
)mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline,total_sample_count=15000
...
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/" --accuracy --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/axs2mlperf/mixtral_reference_code/user.conf" --total-sample-count 15000 --dataset-path "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_4b32ad3fb1d64c379f70fcbe244527a8" --device "cuda" --dtype "float16"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
WARNING:Mixtral-8x7B-Instruct-v0.1-MAIN:Accuracy run will generate the accuracy logs, but the evaluation of the log is not completed yet
Loading dataset...
...
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 15000 samples
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
warnings.warn(
IssueQuery done
Saving outputs to run_outputs/q4118.pkl
Samples run: 1
BatchMaker time: 0.03139448165893555
Inference time: 19.479421377182007
Postprocess time: 0.0007452964782714844
==== Total time: 19.511561155319214
Saving outputs to run_outputs/q3770.pkl
Samples run: 2
BatchMaker time: 0.0001633167266845703
Inference time: 6.8499672412872314
Postprocess time: 0.0005311965942382812
==== Total time: 6.850661754608154
Saving outputs to run_outputs/q12091.pkl
Samples run: 3
BatchMaker time: 0.00019741058349609375
Inference time: 2.904864549636841
Postprocess time: 0.0002982616424560547
==== Total time: 2.905360221862793
...
"__cumulative_param_names": [
"__query",
"task",
"framework",
"loadgen_mode",
"loadgen_scenario",
"total_sample_count",
"tags"
],
"loadgen_scenario": "Offline",
"loadgen_mode": "AccuracyOnly",
"total_sample_count": 15000,
"task": "mixtral",
"framework": "torch",
"__query": "loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline,total_sample_count=15000",
"_replay": [
"^^",
"execute",
[
[
[
"get_kernel"
],
[
"byname",
"mixtral_reference_code"
],
[
"get"
]
]
]
],
"_parent_entries": [
[
"^",
"byname",
"base_mixtral_loadgen_experiment"
]
],
"with_power": null,
"model_name": "mixtral_8x7b",
"mlperf_model_name": "mixtral_8x7b",
"model_path": "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/",
"dataset_name": "mixtral",
"sut_name": "aus655-apollo-0",
"program_name": "mixtral_reference_code",
"loadgen_dataset_size": 15000,
"loadgen_buffer_size": 8,
"loadgen_compliance_test": null,
"dataset_path": "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl",
"device": "cuda",
"dtype": "float16",
"experiment_end_timestamp": "2024.07.09T20:27:30"
} saved to '/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_4b32ad3fb1d64c379f70fcbe244527a8/data_axs.json'
INFO:root:Matched Rule #1/1 produced an entry, which matches the original query.
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_4b32ad3fb1d64c379f70fcbe244527a8']
real 3528m25.489s
user 3548m5.997s
sys 9m49.413s
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ wc -l /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_4b32ad3fb1d64c379f70fcbe244527a8/mlperf_log_accuracy.json
15002 /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_4b32ad3fb1d64c379f70fcbe244527a8/mlperf_log_accuracy.json
Accuracy:
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --checkpoint-path /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/ --mlperf-accuracy-file /local/mnt/workspace/mmirkina/accuracy_temp/mlperf_log_accuracy_full_15000.json --dataset-file /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl --dtype int32
...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [10:12<00:00, 8.17it/s]
Processed 5000 in 612.0956107849488s
25.90% pass@1
{'ruby': 414, 'cpp': 387, 'php': 0, 'typescript': 0, 'python': 494, 'javascript': 0} {'ruby': 846, 'cpp': 743, 'php': 846, 'typescript': 868, 'python': 863, 'javascript': 834}
Results
{'rouge1': 44.9319, 'rouge2': 22.869, 'rougeL': 30.3357, 'rougeLsum': 42.0434, 'gsm8k': 74.04, 'mbxp': 25.9, 'gen_len': 4020445, 'gen_num': 15000, 'gen_tok_len': 4264758, 'tokens_per_sample': 284.3}
During full accuracy running get part of accuracy log for First 5000 samples:
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --checkpoint-path /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/ --mlperf-accuracy-file /local/mnt/workspace/mmirkina/accuracy_temp/mlperf_log_accuracy_first_5000.json --dataset-file /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl --dtype int32
...
Results
{'rouge1': 44.8438, 'rouge2': 22.9349, 'rougeL': 30.0678, 'rougeLsum': 41.934, 'gsm8k': 75.25464349910126, 'mbxp': 26.89738919247116, 'gen_len': 1370417, 'gen_num': 5000, 'gen_tok_len': 1424716, 'tokens_per_sample': 284.9}
Second 5000:
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --checkpoint-path /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/ --mlperf-accuracy-file /local/mnt/workspace/mmirkina/accuracy_temp/mlperf_log_accuracy_second_5000.json --dataset-file /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl --dtype int32
...
Results
{'rouge1': 45.3139, 'rouge2': 22.6935, 'rougeL': 30.6544, 'rougeLsum': 42.3755, 'gsm8k': 74.71333735666867, 'mbxp': 26.498237367802584, 'gen_len': 1313461, 'gen_num': 5000, 'gen_tok_len': 1417708, 'tokens_per_sample': 283.5}
After update in MLCommons inference repo(https://github.com/mlcommons/inference/pull/1754)
Short run(10 samples with full dataset):
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline,total_sample_count=10
...
} saved to '/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_5f197824137f42c9a0dbedee5e0e670f/data_axs.json'
INFO:root:Matched Rule #1/1 produced an entry, which matches the original query.
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_5f197824137f42c9a0dbedee5e0e670f']
real 3m20.567s
user 20m38.746s
sys 8m16.688s
Accuracy script:
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --checkpoint-path /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/ --mlperf-accuracy-file /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_5f197824137f42c9a0dbedee5e0e670f/mlperf_log_accuracy.json --dataset-file /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl --dtype int32
[nltk_data] Downloading package punkt to
[nltk_data] /local/mnt/workspace/mmirkina/nltk_data...
[nltk_data] Package punkt is already up-to-date!
Results
{'gsm8k': 70.0, 'mbxp': 0, 'gen_len': 0, 'gen_num': 10, 'gen_tok_len': 3104, 'tokens_per_sample': 310.4}
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline,total_sample_count=10 , get accuracy_dict
INFO:root:[base_loadgen_experiment] touch _BEFORE_CODE_LOADING=/local/mnt/workspace/mmirkina/work_collection/pint_package_for_python3.9/install/lib/python3.9/site-packages
{'gsm8k': 70.0, 'mbxp': 0, 'gen_len': 0, 'gen_num': 10, 'gen_tok_len': 3104, 'tokens_per_sample': 310.4}
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Offline,total_sample_
count=10,loadgen_query_count=10
...
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_6606e4361e8f407ea4bbebac471330eb/user.conf" --total-sample-count 10 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_6606e4361e8f407ea4bbebac471330eb" --device "cuda:0" --dtype "float16"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:35<00:00, 1.10it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 660 samples
IssueQuery done
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
warnings.warn(
Saving outputs to run_outputs/q8.pkl
Samples run: 1
BatchMaker time: 0.0007636547088623047
Inference time: 40.94887709617615
Postprocess time: 0.0011742115020751953
==== Total time: 40.950814962387085
Saving outputs to run_outputs/q5.pkl
Samples run: 2
BatchMaker time: 0.00020742416381835938
Inference time: 8.971089363098145
Postprocess time: 0.0005135536193847656
==== Total time: 8.971810340881348
Saving outputs to run_outputs/q0.pkl
Samples run: 3
BatchMaker time: 0.000186920166015625
Inference time: 16.994237661361694
Postprocess time: 0.0004570484161376953
==== Total time: 16.994881629943848
Saving outputs to run_outputs/q3.pkl
Samples run: 4
BatchMaker time: 0.00015687942504882812
Inference time: 14.726018190383911
Postprocess time: 0.0003829002380371094
==== Total time: 14.726557970046997
...
Saving outputs to run_outputs/q5.pkl
Samples run: 10
BatchMaker time: 0.00021958351135253906
Inference time: 8.971242904663086
Postprocess time: 0.00045800209045410156
==== Total time: 8.971920490264893
Saving outputs to run_outputs/q0.pkl
Samples run: 11
BatchMaker time: 0.00015234947204589844
Inference time: 17.012208938598633
Postprocess time: 0.000461578369140625
==== Total time: 17.01282286643982
Saving outputs to run_outputs/q3.pkl
Samples run: 12
BatchMaker time: 0.00013637542724609375
Inference time: 14.714705467224121
Postprocess time: 0.0005829334259033203
==== Total time: 14.71542477607727
Saving outputs to run_outputs/q6.pkl
user.conf:
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_6606e4361e8f407ea4bbebac471330eb/user.conf
mixtral-8x7b.Offline.min_query_count = 10
mixtral-8x7b.Offline.max_query_count = 10
mixtral-8x7b.Offline.performance_sample_count_override = 8
mixtral-8x7b.Offline.coalesce_queries = 1
total_sample_count=10
doesn't applied for Performance mode.
Set performance_sample_count_override = total_sample_count
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Offline,total_sample_count=10,loadgen_min_query_count=10
...
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_8cfacb20de2244288c366aa9c5ec492c/user.conf" --total-sample-count 10 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_8cfacb20de2244288c366aa9c5ec492c" --device "cuda:0" --dtype "float16"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:35<00:00, 1.09it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 660 samples
IssueQuery done
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
warnings.warn(
Saving outputs to run_outputs/q8.pkl
Samples run: 1
BatchMaker time: 0.0005807876586914062
Inference time: 41.58476805686951
Postprocess time: 0.0008158683776855469
==== Total time: 41.586164712905884
Saving outputs to run_outputs/q5.pkl
Samples run: 2
BatchMaker time: 0.00020503997802734375
Inference time: 9.112966537475586
Postprocess time: 0.0005428791046142578
==== Total time: 9.113714456558228
Saving outputs to run_outputs/q0.pkl
Samples run: 3
BatchMaker time: 0.0002009868621826172
Inference time: 17.251879692077637
Postprocess time: 0.0003573894500732
...
Samples run: 10
BatchMaker time: 0.00020194053649902344
Inference time: 9.094660758972168
Postprocess time: 0.00040793418884277344
==== Total time: 9.09527063369751
Saving outputs to run_outputs/q0.pkl
Samples run: 11
BatchMaker time: 0.0001404285430908203
Inference time: 17.270347833633423
Postprocess time: 0.00042557716369628906
==== Total time: 17.27091383934021
Saving outputs to run_outputs/q3.pkl
Samples run: 12
BatchMaker time: 0.00013399124145507812
Inference time: 14.922792911529541
Postprocess time: 0.0005435943603515625
==== Total time: 14.923470497131348
Saving outputs to run_outputs/q6.pkl
...
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Offline,total_sample_count=10,loadgen_min_query_count=10 , get_path
INFO:root:[base_loadgen_experiment] touch _BEFORE_CODE_LOADING=/local/mnt/workspace/mmirkina/work_collection/pint_package_for_python3.9/install/lib/python3.9/site-packages
/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_8cfacb20de2244288c366aa9c5ec492c
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_8cfacb20de2244288c366aa9c5ec492c/user.conf
mixtral-8x7b.Offline.min_query_count = 10
mixtral-8x7b.Offline.performance_sample_count_override = 8
mixtral-8x7b.Offline.coalesce_queries = 1
Also IssueQuery started with 660 samples
if
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_644b0ae5227941f0b8c678e658b5069e/user.conf
mixtral-8x7b.Offline.min_query_count = 10
mixtral-8x7b.Offline.performance_sample_count_override = 10
mixtral-8x7b.Offline.coalesce_queries = 1
Then
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Offline,total_sample_count
=10,loadgen_min_query_count=10
...
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_644b0ae5227941f0b8c678e658b5069e/user.conf" --total-sample-count 10 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_644b0ae5227941f0b8c678e658b5069e" --device "cuda:0" --dtype "float16"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:36<00:00, 1.08it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 660 samples
IssueQuery done
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
warnings.warn(
Saving outputs to run_outputs/q8.pkl
Samples run: 1
BatchMaker time: 0.0006940364837646484
Inference time: 41.28359842300415
Postprocess time: 0.0008296966552734375
==== Total time: 41.28512215614319
Saving outputs to run_outputs/q5.pkl
Samples run: 2
BatchMaker time: 0.00020360946655273438
Inference time: 9.022686243057251
Postprocess time: 0.00037741661071777344
==== Total time: 9.023267269134521
...
Samples run: 10
BatchMaker time: 0.0001964569091796875
Inference time: 8.056349992752075
Postprocess time: 0.0004391670227050781
==== Total time: 8.05698561668396
Saving outputs to run_outputs/q8.pkl
Samples run: 11
BatchMaker time: 0.00014781951904296875
Inference time: 40.30126667022705
Postprocess time: 0.0006682872772216797
==== Total time: 40.302082777023315
..
If set
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_9ab6243dfed440b1b67169ccbac1d51b/user.conf
mixtral-8x7b.Offline.min_query_count = 5
mixtral-8x7b.Offline.performance_sample_count_override = 5
mixtral-8x7b.Offline.coalesce_queries = 1
Then
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Offline,total_sample_count=5,loadgen_min_query_count=5
...
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_9ab6243dfed440b1b67169ccbac1d51b/user.conf" --total-sample-count 5 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_9ab6243dfed440b1b67169ccbac1d51b" --device "cuda:0" --dtype "float16"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:33<00:00, 1.16it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 660 samples
IssueQuery done
...
Samples run: 1
BatchMaker time: 0.0006489753723144531
Inference time: 9.025307893753052
Postprocess time: 0.0006871223449707031
==== Total time: 9.026643991470337
Saving outputs to run_outputs/q3.pkl
Samples run: 2
BatchMaker time: 0.0001709461212158203
Inference time: 14.81588864326477
Postprocess time: 0.0004661083221435547
==== Total time: 14.81652569770813
...
Samples run: 5
BatchMaker time: 0.0001342296600341797
Inference time: 19.760494232177734
Postprocess time: 0.0005040168762207031
==== Total time: 19.76113247871399
Saving outputs to run_outputs/q4.pkl
Samples run: 6
BatchMaker time: 0.0002002716064453125
Inference time: 8.046847820281982
Postprocess time: 0.0005133152008056641
==== Total time: 8.047561407089233
Saving outputs to run_outputs/q3.pkl
Samples run: 7
BatchMaker time: 0.000148773193359375
Inference time: 14.783483743667603
Postprocess time: 0.00044274330139160156
==== Total time: 14.784075260162354
Saving outputs to run_outputs/q0.pkl
...
So we can't short run for Offline, Performance.
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf/mixtral_reference_code$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Server,total_sample_count=15
...
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --accuracy --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_75cfd9afe3aa45d791f4f9eb1a9d4b2d/user.conf" --total-sample-count 15 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_75cfd9afe3aa45d791f4f9eb1a9d4b2d" --device "cuda:0" --dtype "float16" --batch-size 1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Traceback (most recent call last):
File "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py", line 166, in <module>
main()
File "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py", line 133, in main
sut = sut_cls(
TypeError: __init__() got an unexpected keyword argument 'batch_size'
INFO:root:Matched Rule #1/1 produced an entry, which matches the original query.
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_75cfd9afe3aa45d791f4f9eb1a9d4b2d']
real 0m2.719s
user 0m4.947s
sys 0m11.550s
We need to fix this issue in https://github.com/mlcommons/inference/blob/master/language/mixtral-8x7b/main.py#L105
In this case
sut_cls = SUTServer
https://github.com/mlcommons/inference/blob/master/language/mixtral-8x7b/main.py#L131
This code works https://github.com/mlcommons/inference/blob/master/language/mixtral-8x7b/SUT.py#L347
But we don't have batch_size
as input parameter
https://github.com/mlcommons/inference/blob/master/language/mixtral-8x7b/main.py#L136
Add fix:
mirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b$ git status
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: SUT.py
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b$ git diff
diff --git a/language/mixtral-8x7b/SUT.py b/language/mixtral-8x7b/SUT.py
index b4a2dc7..724d7b8 100644
--- a/language/mixtral-8x7b/SUT.py
+++ b/language/mixtral-8x7b/SUT.py
@@ -345,13 +345,14 @@ class SUT():
class SUTServer(SUT):
- def __init__(self, model_path=None, dtype="bfloat16", device="cpu",
+ def __init__(self, model_path=None, dtype="bfloat16", device="cpu", batch_size=1,
total_sample_count=24576, dataset_path=None, workers=1):
super().__init__(
model_path=model_path,
dtype=dtype,
device=device,
+ batch_size=batch_size,
total_sample_count=total_sample_count,
dataset_path=dataset_path,
workers=workers)
then Accuracy, short run(15 samples):
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Server,total_sample_count=15
...
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --accuracy --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/
mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_858b72e06ce1404abbcc3954b414b550/user.conf" --total-sample-count 15 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_20
24.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_858b72e06ce1404abbcc3954b414b550" --device "c
uda:0" --dtype "float16" --batch-size 1
...
} saved to '/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_858b72e06ce1404abbcc3954b414b550/data_axs.json'
INFO:root:Matched Rule #1/1 produced an entry, which matches the original query.
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_858b72e06ce1404abbcc3954b414b550']
real 5m9.109s
user 18m19.012s
sys 9m23.009s
Accuracy:
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Server,total_sample_count=15 , get accuracy_dict
...
usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --mlperf-accuracy-file "/local/mnt/workspace/mmirkina/work_collection
/generated_by_mixtral_reference_code_on_get_858b72e06ce1404abbcc3954b414b550/mlperf_log_accuracy.json" --checkpoint-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtr
al-8x7b-instruct-v0.1" --dataset-file "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --dtype "int32"
...
INFO:root:[base_loadgen_experiment] touch _BEFORE_CODE_LOADING=/local/mnt/workspace/mmirkina/work_collection/pint_package_for_python3.9/install/lib/python3.9/site-packages
{'gsm8k': 80.0, 'mbxp': 0, 'gen_len': 0, 'gen_num': 15, 'gen_tok_len': 2701, 'tokens_per_sample': 180.1}
- short run
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Server,total_sample_count=15,loadgen_query_count=15
...
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_725f03da97054c5c9a7534b7e21b98f6/user.conf" --total-sample-count 15 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_725f03da97054c5c9a7534b7e21b98f6" --device "cuda:0" --d
type "float16" --batch-size 1
...
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:35<00:00, 1.11it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to
1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
warnings.warn(
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Server
Mode : PerformanceOnly
Completed samples per second : 0.05
Completed tokens per second: 10.15
Result is : INVALID
Performance constraints satisfied : NO
Min duration satisfied : NO
Min queries satisfied : Yes
Early stopping satisfied: NO
Recommendations:
* TTFT constrain not met: Reduce target QPS to improve latency.
* Increase the target QPS so the loadgen pre-generates more queries.
TTFT Early Stopping Result:
TPOT Early Stopping Result:
* Run unsuccessful.
* Processed 15 queries.
* Would need to run at least 444 more queries,
with the run being successful if every additional
query were under latency.
================================================
Additional Stats
================================================
Scheduled samples per second : 0.82
Min latency (ns) : 13064653379
Max latency (ns) : 267793204858
Mean latency (ns) : 143221598769
50.00 percentile latency (ns) : 150929072160
90.00 percentile latency (ns) : 254726221440
95.00 percentile latency (ns) : 267793204858
97.00 percentile latency (ns) : 267793204858
99.00 percentile latency (ns) : 267793204858
99.90 percentile latency (ns) : 267793204858
Completed tokens per second : 10.15
Min First Token latency (ns) : 1009652297
Max First Token latency (ns) : 251018039938
Mean First Token latency (ns) : 124472043771
50.00 percentile first token latency (ns) : 141028154321
90.00 percentile first token latency (ns) : 235301181383
95.00 percentile first token latency (ns) : 251018039938
97.00 percentile first token latency (ns) : 251018039938
99.00 percentile first token latency (ns) : 251018039938
99.90 percentile first token latency (ns) : 251018039938
Min Time to Output Token (ns) : 97460180
Max Time to Output Token (ns) : 98811484
Mean Time to Output Token (ns) : 97999451
50.00 percentile time to output token (ns) : 97908450
90.00 percentile time to output token (ns) : 98751642
95.00 percentile time to output token (ns) : 98811484
97.00 percentile time to output token (ns) : 98811484
99.00 percentile time to output token (ns) : 98811484
99.90 percentile time to output token (ns) : 98811484
================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1
ttft_latency (ns): 2000000000
tpot_latency (ns): 200000000
max_async_queries : 0
min_duration (ms): 600000
max_duration (ms): 0
min_query_count : 15
max_query_count : 15
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 15
No warnings encountered during test.
1 ERROR encountered. See detailed log.
INFO:Mixtral-8x7B-Instruct-v0.1:Exiting First token response thread
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
INFO:root:[generated_by_mixtral_reference_code_on_get_725f03da97054c5c9a7534b7e21b98f6] parameters {
...
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_725f03da97054c5c9a7534b7e21b98f6']
real 5m28.203s
user 18m20.823s
sys 9m5.310s
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Server,total_sample_c
ount=662,loadgen_target_qps=0.05
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/w
ork_collection/generated_by_mixtral_reference_code_on_get_c1ddb273906f457abe0630b98b395a8e/user.conf" --total-sample-count 662 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_
mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_c1ddb273906f457abe0630b98b395a8e" --device "cuda:0" --
dtype "float16" --batch-size 1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:34<00:00, 1.12it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to
1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
warnings.warn(
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Server
Mode : PerformanceOnly
Completed samples per second : 0.05
Completed tokens per second: 6.23
Result is : INVALID
Performance constraints satisfied : NO
Min duration satisfied : Yes
Min queries satisfied : Yes
Early stopping satisfied: NO
Recommendations:
* TTFT constrain not met: Reduce target QPS to improve latency.
TTFT Early Stopping Result:
TPOT Early Stopping Result:
* Run unsuccessful.
* Processed 100 queries.
* Would need to run at least 359 more queries,
with the run being successful if every additional
query were under latency.
================================================
Additional Stats
================================================
Scheduled samples per second : 0.05
Min latency (ns) : 5830023019
Max latency (ns) : 68595234786
Mean latency (ns) : 23372322742 50.00 percentile latency (ns) : 18108096926
90.00 percentile latency (ns) : 52467416019
95.00 percentile latency (ns) : 60556450072
97.00 percentile latency (ns) : 64991970195
99.00 percentile latency (ns) : 68595234786
99.90 percentile latency (ns) : 68595234786
Completed tokens per second : 6.23
Min First Token latency (ns) : 185737515
Max First Token latency (ns) : 60490290109
Mean First Token latency (ns) : 10809999725
50.00 percentile first token latency (ns) : 4415499799
90.00 percentile first token latency (ns) : 36296355041
95.00 percentile first token latency (ns) : 42008621796
97.00 percentile first token latency (ns) : 50461911783
99.00 percentile first token latency (ns) : 60490290109
99.90 percentile first token latency (ns) : 60490290109
Min Time to Output Token (ns) : 97666897
Max Time to Output Token (ns) : 99740760
Mean Time to Output Token (ns) : 98312017
50.00 percentile time to output token (ns) : 98235140
90.00 percentile time to output token (ns) : 98852169
95.00 percentile time to output token (ns) : 98921387
97.00 percentile time to output token (ns) : 98990886
99.00 percentile time to output token (ns) : 99740760
99.90 percentile time to output token (ns) : 99740760
================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 0.05
ttft_latency (ns): 2000000000
tpot_latency (ns): 200000000
max_async_queries : 0
min_duration (ms): 600000
max_duration (ms): 0
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 662
No warnings encountered during test.
No errors encountered during test.
INFO:Mixtral-8x7B-Instruct-v0.1:Exiting First token response thread
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_c1ddb273906f457abe0630b98b395a8e']
real 35m14.624s
user 34m59.382s
sys 9m17.484s
If set in mlperf.conf mixtral-8x7b.Server.ttft_latency = 2000 -> mixtral-8x7b.Server.ttft_latency = 65000 mixtral-8x7b.Server.tpot_latency = 200 -> mixtral-8x7b.Server.tpot_latency = 150
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Server,total_sample_count=662,loadgen_target_qps=0.05,loadgen_ttft_latency=65000,loadgen_tpot_latency=150
...
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/w
ork_collection/generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd/user.conf" --total-sample-count 662 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_
mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd" --device "cuda:0" --
dtype "float16" --batch-size 1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 31%|█████████████████████████████████████████████▏ | 12/39 [00:10<00:24, 1.09it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:34<00:00, 1.14it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to
1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
warnings.warn(
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Server
Mode : PerformanceOnly
Completed samples per second : 0.05
Completed tokens per second: 6.23
Result is : INVALID
Performance constraints satisfied : NO
Min duration satisfied : Yes
Min queries satisfied : Yes
Early stopping satisfied: NO
Recommendations:
* TTFT constrain not met: Reduce target QPS to improve latency.
TTFT Early Stopping Result:
TPOT Early Stopping Result:
* Run unsuccessful.
* Processed 100 queries.
* Would need to run at least 359 more queries,
with the run being successful if every additional
query were under latency.
================================================
Additional Stats
================================================
Scheduled samples per second : 0.05
Min latency (ns) : 5718265818
Max latency (ns) : 66419296347
Mean latency (ns) : 22677037097
50.00 percentile latency (ns) : 17471114779
90.00 percentile latency (ns) : 50096406261
95.00 percentile latency (ns) : 58467919575
97.00 percentile latency (ns) : 62976406531
99.00 percentile latency (ns) : 66419296347 99.90 percentile latency (ns) : 66419296347
Completed tokens per second : 6.23
Min First Token latency (ns) : 185477181
Max First Token latency (ns) : 58474430840
Mean First Token latency (ns) : 10341132750
50.00 percentile first token latency (ns) : 4303981659
90.00 percentile first token latency (ns) : 34374459816
95.00 percentile first token latency (ns) : 40702664334
97.00 percentile first token latency (ns) : 48617643560
99.00 percentile first token latency (ns) : 58474430840
99.90 percentile first token latency (ns) : 58474430840
Min Time to Output Token (ns) : 95681850
Max Time to Output Token (ns) : 97959003
Mean Time to Output Token (ns) : 96554188
50.00 percentile time to output token (ns) : 96587430
90.00 percentile time to output token (ns) : 97079977
95.00 percentile time to output token (ns) : 97193354
97.00 percentile time to output token (ns) : 97411686
99.00 percentile time to output token (ns) : 97959003
99.90 percentile time to output token (ns) : 97959003
================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 0.05
ttft_latency (ns): 2000000000
tpot_latency (ns): 200000000
max_async_queries : 0
min_duration (ms): 600000
max_duration (ms): 0
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 662
No warnings encountered during test.
No errors encountered during test.
INFO:Mixtral-8x7B-Instruct-v0.1:Exiting First token response thread
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
...
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd']
real 35m13.781s
user 34m55.483s
sys 8m59.240s
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd/user.conf
mixtral-8x7b.Server.performance_sample_count_override = 662
mixtral-8x7b.Server.target_qps = 0.05
mixtral-8x7b.Server.coalesce_queries = 1
mixtral-8x7b.Server.ttft_latency = 65000
mixtral-8x7b.Server.tpot_latency = 150
Changed in mlperf.conf mixtral-8x7b.Server.ttft_latency = 62000
Now в mlperf.conf: mixtral-8x7b.Server.target_latency = 0 mixtral-8x7b.Server.ttft_latency = 62000 mixtral-8x7b.Server.tpot_latency = 200
Then
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Server,loadgen_min_query_count=662,loadgen_target_qps=0.05
...
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_81ae6357d3f24c52a759f909b27a0545/user.conf" --total-sample-count 15000 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_81ae6357d3f24c52a759f909b27a0545" --device "cuda:0" --dtype "float16" --batch-size 1
...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 31%|█████████████████████████████████████████████▏ | 12/39 [00:10<00:24, 1.09it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:34<00:00, 1.14it/s]
Loaded model
Loaded tokenizer INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to
1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
warnings.warn(
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Server
Mode : PerformanceOnly
Completed samples per second : 0.05
Completed tokens per second: 6.55
Result is : INVALID
Performance constraints satisfied : NO
Min duration satisfied : Yes
Min queries satisfied : Yes
Early stopping satisfied: NO
Recommendations:
* TTFT constrain not met: Reduce target QPS to improve latency.
TTFT Early Stopping Result:
* Run unsuccessful.
* Processed 662 queries.
* Would need to run at least 7046 more queries,
with the run being successful if every additional
query were under latency.
TPOT Early Stopping Result:
* Run successful.
================================================
Additional Stats
================================================
Scheduled samples per second : 0.05
Min latency (ns) : 2486717028
Max latency (ns) : 159035247554
Mean latency (ns) : 33772426088
50.00 percentile latency (ns) : 23502796314
90.00 percentile latency (ns) : 75285437852
95.00 percentile latency (ns) : 118225505597
97.00 percentile latency (ns) : 128366769948
99.00 percentile latency (ns) : 151858693459
99.90 percentile latency (ns) : 159035247554
Completed tokens per second : 6.55
Min First Token latency (ns) : 140220154
Max First Token latency (ns) : 150555503626
Mean First Token latency (ns) : 19762347996
50.00 percentile first token latency (ns) : 7281915958
90.00 percentile first token latency (ns) : 56872547290
95.00 percentile first token latency (ns) : 101315290042
97.00 percentile first token latency (ns) : 115607824512
99.00 percentile first token latency (ns) : 132908847055
99.90 percentile first token latency (ns) : 150555503626
Min Time to Output Token (ns) : 97579488
Max Time to Output Token (ns) : 102603516
Mean Time to Output Token (ns) : 98661860
50.00 percentile time to output token (ns) : 98459173
90.00 percentile time to output token (ns) : 99694911
95.00 percentile time to output token (ns) : 100262397
97.00 percentile time to output token (ns) : 100780724
99.00 percentile time to output token (ns) : 101512282
99.90 percentile time to output token (ns) : 102603516
================================================
Test Parameters Used
samples_per_query : 1
target_qps : 0.05
ttft_latency (ns): 62000000000
tpot_latency (ns): 200000000
max_async_queries : 0
min_duration (ms): 600000
max_duration (ms): 0
min_query_count : 662
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 15000
No warnings encountered during test.
No errors encountered during test.
No errors encountered during test.
INFO:Mixtral-8x7B-Instruct-v0.1:Exiting First token response thread
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
...
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_81ae6357d3f24c52a759f909b27a0545']
real 242m29.181s
user 170m21.273s
sys 9m32.119s
But
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_81ae6357d3f24c52a759f909b27a0545/user.conf
mixtral-8x7b.Server.min_query_count = 662
mixtral-8x7b.Server.performance_sample_count_override = 15000
mixtral-8x7b.Server.target_qps = 0.05
mixtral-8x7b.Server.coalesce_queries = 1
If set:
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf | grep mixtral
mixtral-8x7B.*.sample_concatenate_permutation = 1
mixtral-8x7b.*.use_token_latencies = 1
# Only ttft and tpot are tracked for the llama2-70b & mixtral-8x7B benchmark therefore target_latency = 0
mixtral-8x7b.Server.target_latency = 0
mixtral-8x7b.Server.ttft_latency = 62000
mixtral-8x7b.Server.tpot_latency = 200
mixtral-8x7b.Offline.min_query_count = 15000
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/w
ork_collection/generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd/user.conf" --total-sample-count 662 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_
mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd" --device "cuda:0" --
dtype "float16" --batch-size 1
mlperf.conf copied to created entry.
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=moe,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Server,loadgen_min_query_count=1000,loadgen_target_qps=0.05
...
/usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_moe_reference_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52db
d/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_moe_reference_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52dbd/user.conf" --total-sample-count 15000 --dataset-pa
th "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_moe_referenc
e_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52dbd" --device "cuda:0" --dtype "float16" --batch-size 1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:35<00:00, 1.09it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to
1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
warnings.warn(
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Server
Mode : PerformanceOnly
Completed samples per second : 0.05
Completed tokens per second: 6.87
Result is : INVALID
Performance constraints satisfied : NO
Min duration satisfied : Yes
Min queries satisfied : Yes
Early stopping satisfied: NO
Recommendations:
* TTFT constrain not met: Reduce target QPS to improve latency.
TTFT Early Stopping Result:
* Run unsuccessful.
* Processed 1000 queries.
* Would need to run at least 12684 more queries,
with the run being successful if every additional
query were under latency.
TPOT Early Stopping Result:
* Run successful.
================================================
Additional Stats
================================================
Scheduled samples per second : 0.05
Min latency (ns) : 2254937408
Max latency (ns) : 203136802769
Mean latency (ns) : 36474350157
50.00 percentile latency (ns) : 24622631339 90.00 percentile latency (ns) : 88052729073
95.00 percentile latency (ns) : 114536578525
97.00 percentile latency (ns) : 132427403215
99.00 percentile latency (ns) : 169032020175
99.90 percentile latency (ns) : 203136802769
Completed tokens per second : 6.87
Min First Token latency (ns) : 138495953
Max First Token latency (ns) : 186722721161
Mean First Token latency (ns) : 22513031640
50.00 percentile first token latency (ns) : 8509059644
90.00 percentile first token latency (ns) : 68997991288
95.00 percentile first token latency (ns) : 99699060394
97.00 percentile first token latency (ns) : 115203356595
99.00 percentile first token latency (ns) : 156668903970
99.90 percentile first token latency (ns) : 186722721161
Min Time to Output Token (ns) : 95872056
Max Time to Output Token (ns) : 100753945
Mean Time to Output Token (ns) : 97079862
50.00 percentile time to output token (ns) : 96877641
90.00 percentile time to output token (ns) : 98147562
95.00 percentile time to output token (ns) : 98809447
97.00 percentile time to output token (ns) : 99386795
99.00 percentile time to output token (ns) : 100146259
99.90 percentile time to output token (ns) : 100753945
================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 0.05
ttft_latency (ns): 62000000000
tpot_latency (ns): 200000000
max_async_queries : 0
min_duration (ms): 600000
max_duration (ms): 0
min_query_count : 1000
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 15000
No warnings encountered during test.
No errors encountered during test.
INFO:Mixtral-8x7B-Instruct-v0.1:Exiting First token response thread
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
...
['^', 'byname', 'generated_by_moe_reference_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52dbd']
real 353m21.616s
user 249m30.290s
sys 9m41.276s
mlperf.conf
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_moe_reference_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52dbd/mlperf.conf | grep mixtral
mixtral-8x7B.*.sample_concatenate_permutation = 1
mixtral-8x7b.*.use_token_latencies = 1
# Only ttft and tpot are tracked for the llama2-70b & mixtral-8x7B benchmark therefore target_latency = 0
mixtral-8x7b.Server.target_latency = 0
mixtral-8x7b.Server.ttft_latency = 62000
mixtral-8x7b.Server.tpot_latency = 200
mixtral-8x7b.Offline.min_query_count = 15000
user.conf:
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_moe_reference_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52dbd/user.conf
mixtral-8x7b.Server.min_query_count = 1000
mixtral-8x7b.Server.performance_sample_count_override = 15000
mixtral-8x7b.Server.target_qps = 0.05
mixtral-8x7b.Server.coalesce_queries = 1
Useful updates:
mixeval
package for accuracy calculationmaria@chai ~/work_collection/axs2mlperf (mixtral-dev *=)$ ls -la tokenizer_dir/
total 2252
drwxr-xr-x 2 maria krai 4096 Jul 17 16:18 .
drwxr-xr-x 55 maria krai 4096 Jul 17 16:31 ..
-rw-r--r-- 1 maria krai 2103 Jul 17 16:17 tokenizer_config.json
-rw-r--r-- 1 maria krai 1795188 Jul 17 16:17 tokenizer.json
-rw-r--r-- 1 maria krai 493443 Jul 17 16:17 tokenizer.model
maria@chai ~/work_collection/axs2mlperf (mixtral-dev *=)$ ls -la empty_dir/
total 8
drwxr-xr-x 2 maria krai 4096 Jul 17 16:17 .
drwxr-xr-x 55 maria krai 4096 Jul 17 16:31 ..
maria@chai ~/work_collection/axs2mlperf (mixtral-dev *=)$ cd empty_dir/
maria@chai ~/work_collection/axs2mlperf/empty_dir (mixtral-dev *=)$ diff -ruNa . ../tokenizer_dir > ../model_mixtral_checkpoint_recipe/tokenizer.patch
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ ls model_pytorch_mixtral_recipe/
data_axs.json tokenizer.patch
data_axs.json
file like this
{
"_producer_rules": [
[ [ "downloaded", "pytorch_model", "model_name=mixtral-8x7b", "source?=via_rclone" ], [["get_kernel"],["byname","downloader"],["download"]], {
"downloading_tool_query": "shell_tool,can_download_url_from_rclone",
"file_name": [ "mixtral-8x7b-instruct-v0.1" ],
"url": "mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1",
"patch": "tokenizer.patch",
"abs_patch_path": [ "^^", "substitute", "#{this_entry_path}#/tokenizer.patch" ]
}, [ "this_entry_path" ] ]
],
"this_entry_path": [ "^^", "get_path" ]
}
Added "patch", "abs_patch_path".
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery downloaded,pytorch_model,model_name=mixtral-8x7b
...
"/usr/bin/rclone" copy mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1 "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" -P
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Transferred: 173.982 GiB / 173.982 GiB, 100%, 30.664 MiB/s, ETA 0s
Transferred: 42 / 42, 100%
Elapsed time: 36m6.9s
WARNING:root:The resolved patch_tool_entry 'patch_tool' located at '/local/mnt/workspace/mmirkina/work_collection/patch_tool' uses the shell tool '/usr/bin/patch'
WARNING:root:shell.run() about to execute (with env=None, in_dir=None, capture_output=False, errorize_output=False, capture_stderr=False, split_to_lines=False):
"/usr/bin/patch" -p1 -d "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" -i "/local/mnt/workspace/mmirkina/work_collection/axs2mlperf/model_pytorch_mixtral_recipe/tokenizer.patch"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
patching file tokenizer_config.json
patching file tokenizer.json
patching file tokenizer.model
INFO:root:Matched Rule #1/2 produced an entry, which matches the original query.
['^', 'byname', 'downloaded_mixtral-8x7b-instruct-v0.1']
Then
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ ls -la /local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1
total 182435448
drwxr-xr-x 2 mmirkina users 4096 Jul 18 07:55 .
drwxr-xr-x 3 mmirkina users 4096 Jul 18 07:19 ..
-rw-r--r-- 1 mmirkina users 803 Jun 24 17:04 config.json
-rw-r--r-- 1 mmirkina users 111 Jun 24 17:04 generation_config.json
-rw-r--r-- 1 mmirkina users 4920052720 Jun 24 17:04 model-00001-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00002-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00003-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00004-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00005-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504264 Jun 24 17:05 model-00006-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559912 Jun 24 17:05 model-00007-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00008-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00009-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00010-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00011-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646240 Jun 24 17:06 model-00012-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:06 model-00013-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00014-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00015-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00016-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00017-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00018-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:08 model-00019-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00020-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00021-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00022-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00023-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00024-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:09 model-00025-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00026-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00027-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00028-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00029-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00030-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:11 model-00031-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00032-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00033-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00034-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00035-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00036-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646264 Jun 24 17:12 model-00037-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:13 model-00038-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 1463862216 Jun 24 17:13 model-00039-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 92659 Jun 24 17:13 model.safetensors.index.json
-rw-r--r-- 1 mmirkina users 2103 Jul 18 07:55 tokenizer_config.json
-rw-r--r-- 1 mmirkina users 1795188 Jul 18 07:55 tokenizer.json
-rw-r--r-- 1 mmirkina users 493443 Jul 18 07:55 tokenizer.model
mixeval
package for accuracy calculation
mixeval can't install as python dependency in program during running an experiment.
"mxeval_query": ["python_package", "package_name=mxeval", "installable=git+https://github.com/amazon-science/mxeval.git@e09974f990eeaf0c0e8f2b5eaff4be66effb2c86" ],
We have this issue:
ERROR: For req: mxeval==1.0. Invalid script entry point: <ExportEntry evaluate_functional_correctness
Also we have the same issue when we run it locally
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/mxeval_git$ python3 -m pip uninstall mxeval
Found existing installation: mxeval 1.0
Uninstalling mxeval-1.0:
Would remove:
/local/mnt/workspace/mmirkina/.local/bin/evaluate_functional_correctness
/local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages/mxeval.egg-link
Proceed (Y/n)? Y
Successfully uninstalled mxeval-1.0
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/mxeval_git$ python3 -m pip show mxeval
WARNING: Package(s) not found: mxeval
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/mxeval_git$ python3 -m pip install git+https://github.com/amazon-science/mxeval.git@e09974f990eeaf0c0e8f2b5eaff4be66effb2c86
Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/amazon-science/mxeval.git@e09974f990eeaf0c0e8f2b5eaff4be66effb2c86
Cloning https://github.com/amazon-science/mxeval.git (to revision e09974f990eeaf0c0e8f2b5eaff4be66effb2c86) to /tmp/pip-req-build-eppd08hl
Running command git clone --filter=blob:none --quiet https://github.com/amazon-science/mxeval.git /tmp/pip-req-build-eppd08hl
Running command git rev-parse -q --verify 'sha^e09974f990eeaf0c0e8f2b5eaff4be66effb2c86'
Running command git fetch -q https://github.com/amazon-science/mxeval.git e09974f990eeaf0c0e8f2b5eaff4be66effb2c86
Resolved https://github.com/amazon-science/mxeval.git to commit e09974f990eeaf0c0e8f2b5eaff4be66effb2c86
Preparing metadata (setup.py) ... done
Requirement already satisfied: fire in /local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages (from mxeval==1.0) (0.6.0)
Requirement already satisfied: numpy in /local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages (from mxeval==1.0) (1.24.1)
Requirement already satisfied: tqdm in /local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages (from mxeval==1.0) (4.66.4)
Requirement already satisfied: termcolor in /local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages (from fire->mxeval==1.0) (2.4.0)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from fire->mxeval==1.0) (1.16.0)
Building wheels for collected packages: mxeval
Building wheel for mxeval (setup.py) ... done
Created wheel for mxeval: filename=mxeval-1.0-py3-none-any.whl size=14797 sha256=28ea70cfd9686f1474eaf3d24eaae1980e8f1219d81fb2ed03662976606fd4d2
Stored in directory: /local/mnt/workspace/mmirkina/.cache/pip/wheels/6a/40/82/769569691d13c70cf87822cb923c87c2a856382763754b47bb
Successfully built mxeval
Installing collected packages: mxeval
ERROR: For req: mxeval==1.0. Invalid script entry point: <ExportEntry evaluate_functional_correctness = mxeval.evaluate_functional_correctness:None []> - A callable suffix is required. Cf https://packaging.python.org/specifications/entry-points/#use-for-scripts for more information.
But it is installed
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/mxeval_git$ python3 -m pip show mxeval
Name: mxeval
Version: 1.0
Summary: UNKNOWN
Home-page: UNKNOWN
Author: AWS AI Labs
Author-email:
License: UNKNOWN
Location: /local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages
Requires: fire, numpy, tqdm
Required-by:
Solution:
Changed link
installable=git+https://github.com/amazon-science/mxeval.git@e09974f990eeaf0c0e8f2b5eaff4be66effb2c86
to
installable=git+https://github.com/shubhamugare/mxeval.git
Commit:
Added python packages for accuracy calculation for mixtral
All python packages for accuracy calculation Commit: Added packages for moe(mixtral) accuracy calculation
According to last updates in https://github.com/mlcommons/inference/issues/1782#issuecomment-2237093081we don't need path for downloading checkpoint model.
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery downloaded,pytorch_model,model_name=mixtral-8x7b
"/usr/bin/rclone" copy mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1 "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7
b-instruct-v0.1" -P
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^
Transferred: 173.984 GiB / 173.984 GiB, 100%, 23.538 MiB/s, ETA 0s
Transferred: 46 / 46, 100%
Elapsed time: 36m5.1s
INFO:root:Matched Rule #1/2 produced an entry, which matches the original query.
['^', 'byname', 'downloaded_mixtral-8x7b-instruct-v0.1']
real 36m5.382s
user 12m17.626s
sys 11m56.073s
/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1
real 0m0.081s
user 0m0.064s
sys 0m0.017s
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ ls -la /local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1/
total 182435452
drwxr-xr-x 2 mmirkina users 4096 Jul 22 05:05 .
drwxr-xr-x 3 mmirkina users 4096 Jul 22 04:29 ..
-rw-r--r-- 1 mmirkina users 803 Jun 24 17:04 config.json
-rw-r--r-- 1 mmirkina users 111 Jun 24 17:04 generation_config.json
-rw-r--r-- 1 mmirkina users 4920052720 Jun 24 17:04 model-00001-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00002-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00003-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00004-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00005-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504264 Jun 24 17:05 model-00006-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559912 Jun 24 17:05 model-00007-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00008-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00009-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00010-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00011-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646240 Jun 24 17:06 model-00012-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:06 model-00013-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00014-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00015-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00016-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00017-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00018-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:08 model-00019-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00020-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00021-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00022-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00023-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00024-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:09 model-00025-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00026-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00027-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00028-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00029-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00030-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:11 model-00031-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00032-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00033-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00034-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00035-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00036-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646264 Jun 24 17:12 model-00037-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:13 model-00038-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 1463862216 Jun 24 17:13 model-00039-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 92659 Jun 24 17:13 model.safetensors.index.json
-rw-r--r-- 1 mmirkina users 72 Jul 18 12:00 special_tokens_map.json
-rw-r--r-- 1 mmirkina users 1466 Jul 18 11:56 tokenizer_config.json
-rw-r--r-- 1 mmirkina users 1795303 Jul 18 11:56 tokenizer.json
-rw-r--r-- 1 mmirkina users 493443 Jul 18 11:57 tokenizer.model
So remove patch supporing:
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ git diff
diff --git a/model_pytorch_mixtral_recipe/data_axs.json b/model_pytorch_mixtral_recipe/data_axs.json
index 7aa9cb0..26b25bf 100644
--- a/model_pytorch_mixtral_recipe/data_axs.json
+++ b/model_pytorch_mixtral_recipe/data_axs.json
@@ -3,10 +3,7 @@
[ [ "downloaded", "pytorch_model", "model_name=mixtral-8x7b", "source?=via_rclone" ], [["get_kernel"],["byname","downloader"],["download"]], {
"downloading_tool_query": "shell_tool,can_download_url_from_rclone",
"file_name": [ "mixtral-8x7b-instruct-v0.1" ],
- "url": "mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1",
- "patch": "tokenizer.patch",
- "abs_patch_path": [ "^^", "substitute", "#{this_entry_path}#/tokenizer.patch" ]
- }, [ "this_entry_path" ] ]
- ],
- "this_entry_path": [ "^^", "get_path" ]
+ "url": "mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1"
+ }, [] ]
+
Last version of useful commands:
time axs byquery downloaded,pytorch_model,model_name=mixtral-8x7b
axs byquery downloaded,preprocessed,dataset_name=mixtral
axs byquery loadgen_output,task=moe,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline,total_sample_count=10 , get accuracy_dict
axs byquery loadgen_output,task=moe,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline , get accuracy_dict
axs byquery loadgen_output,task=moe,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Server,loadgen_min_query_count=1000,loadgen_target_qps=0.05
All commits in mixtral-dev
branch in axs2mlperf
repository - https://github.com/krai/axs2mlperf/tree/mixtral-dev.
Add reference code for
mixtral-8x7b
(https://github.com/mlcommons/inference/tree/master/language/mixtral-8x7b) inaxs
. To do the following steps:use the following branch
mixtral-dev
inaxs2mlperf
.