krai / axs2mlperf

Automated KRAI X workflows for reproducing MLPerf Inference submissions
https://krai.ai
MIT License
1 stars 1 forks source link

Add reference code for `mixtral-8x7b` in `axs` #49

Open maria-18-git opened 2 months ago

maria-18-git commented 2 months ago

Add reference code for mixtral-8x7b(https://github.com/mlcommons/inference/tree/master/language/mixtral-8x7b) in axs. To do the following steps:

use the following branch mixtral-dev in axs2mlperf.

maria-18-git commented 1 month ago

1. Download dataset recipe

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery downloaded,preprocessed,dataset_name=mixtral
...
        "/usr/bin/wget" -O "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" https://inference.mlcommons-storage.org/mixtral_8x7b%2F2024.06.06_mixtral_15k_v4.pkl
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--2024-07-11 06:18:28--  https://inference.mlcommons-storage.org/mixtral_8x7b%2F2024.06.06_mixtral_15k_v4.pkl
Resolving inference.mlcommons-storage.org (inference.mlcommons-storage.org)... 172.67.167.47, 104.21.16.91, 2606:4700:3037::6815:105b, ...
Connecting to inference.mlcommons-storage.org (inference.mlcommons-storage.org)|172.67.167.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 71763360 (68M) [application/octet-stream]
Saving to: ‘/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl’

/local/mnt/workspace/mmirkina/work_collection/downlo 100%[=====================================================================================================================>]  68.44M  38.7MB/s    in 1.8s

2024-07-11 06:18:30 (38.7 MB/s) - ‘/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl’ saved [71763360/71763360]

INFO:root:Matched Rule #1/2 produced an entry, which matches the original query.

['^', 'byname', 'downloaded_2024.06.06_mixtral_15k_v4.pkl']
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery downloaded,preprocessed,dataset_name=mixtral , get_path
/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ ls /local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/
2024.06.06_mixtral_15k_v4.pkl  data_axs.json
maria-18-git commented 1 month ago

Commit: Added dataset_mixtral_preprocessed_recipe

maria-18-git commented 1 month ago

2. Download checkpoint model recipe

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery extracted,checkpoint,model_name=mixtral
...
        "/usr/bin/rclone" copy mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1 "/local/mnt/workspace/mmirkina/work_collection/downloaded_extracted_mixtral-8x7b-instruct-v0.1/extracted/mixtral-8x7b-instruct-v0.1" -P
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Transferred:      173.982 GiB / 173.982 GiB, 100%, 28.538 MiB/s, ETA 0s
Transferred:           42 / 42, 100%
Elapsed time:     34m21.2s
INFO:root:Matched Rule #1/2 produced an entry, which matches the original query.

['^', 'byname', 'downloaded_extracted_mixtral-8x7b-instruct-v0.1']
maria-18-git commented 1 month ago

Commit: Added model_mixtral_checkpoint_recipe [Updated]: Added model_mixtral_checkpoint_recipe

maria-18-git commented 1 month ago

3. Run the recipe for copying tokerizer files:

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byname model_mixtral_recipe , run
...
WARNING:root:shell.run() about to execute (with env=None, in_dir=None, capture_output=False, errorize_output=False, capture_stderr=False, split_to_lines=False):
        "/usr/bin/rclone" copy mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1 "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" -P
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Transferred:      173.982 GiB / 173.982 GiB, 100%, 12.036 MiB/s, ETA 0s
Transferred:           42 / 42, 100%
Elapsed time:     35m37.2s
INFO:root:Matched Rule #1/2 produced an entry, which matches the original query.

WARNING:root:shell.run() about to execute (with env=None, in_dir=None, capture_output=False, errorize_output=False, capture_stderr=False, split_to_lines=False):
        cp /local/mnt/workspace/mmirkina/work_collection/axs2mlperf/model_mixtral_recipe/tokenizer* /local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0

In checkpoint model dicrectory:

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ ls -la ../downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1/
total 182435448
drwxr-xr-x 2 mmirkina users       4096 Jul 12 13:13 .
drwxr-xr-x 3 mmirkina users       4096 Jul 12 12:38 ..
-rw-r--r-- 1 mmirkina users        803 Jun 24 17:04 config.json
-rw-r--r-- 1 mmirkina users        111 Jun 24 17:04 generation_config.json
-rw-r--r-- 1 mmirkina users 4920052720 Jun 24 17:04 model-00001-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00002-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00003-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00004-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00005-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504264 Jun 24 17:05 model-00006-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559912 Jun 24 17:05 model-00007-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00008-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00009-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00010-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00011-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646240 Jun 24 17:06 model-00012-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:06 model-00013-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00014-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00015-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00016-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00017-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00018-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:08 model-00019-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00020-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00021-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00022-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00023-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00024-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:09 model-00025-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00026-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00027-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00028-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00029-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00030-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:11 model-00031-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00032-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00033-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00034-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00035-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00036-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646264 Jun 24 17:12 model-00037-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:13 model-00038-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 1463862216 Jun 24 17:13 model-00039-of-00039.safetensors
-rw-r--r-- 1 mmirkina users      92659 Jun 24 17:13 model.safetensors.index.json
-rw-r--r-- 1 mmirkina users       1466 Jul 12 13:13 tokenizer_config.json
-rw-r--r-- 1 mmirkina users    1795303 Jul 12 13:13 tokenizer.json
-rw-r--r-- 1 mmirkina users     493443 Jul 12 13:13 tokenizer.model
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery downloaded,checkpoint,model_name=mixtral --- , get_path
/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1
maria-18-git commented 1 month ago

4. Run Accuracy(short run, without downloaded dataset, model by axs)

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline,dataset_path=/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/download_dataset_15_samples/mixtral_15.pkl,total_sample_count=15,loadgen_dataset_size=15
...
  "model_name": "mixtral_8x7b",
    "mlperf_model_name": "mixtral_8x7b",
    "model_path": "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/",
    "dataset_name": "mixtral",
    "sut_name": "aus655-apollo-0",
    "program_name": "mixtral_reference_code",
    "loadgen_buffer_size": 8,
    "loadgen_compliance_test": null,
    "device": "cuda",
    "dtype": "float16"
} saved to '/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5/data_axs.json'
WARNING:root:shell.run() about to execute (with env={'PATH': '/usr2/mmirkina/.local/bin:/usr2/mmirkina/.local/bin:/usr2/mmirkina/.local/bin:/usr2/mmirkina/.local/bin:/usr2/mmirkina/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr2/mmirkina/bin:/usr2/mmirkina/bin:/usr2/mmirkina/bin:/usr2/mmirkina/bin:/local/mnt/workspace/mmirkina/axs:/usr2/mmirkina/bin:/local/mnt/workspace/mmirkina/axs:/local/mnt/workspace/mmirkina//bin:/snap/bin:/local/mnt/workspace/mmirkina/work_collection/numpy_1.24.1_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/pybind11_2.10.4_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/pandas_2.2.2_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/nltk_3.8.1_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/evaluate_0.4.0_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/absl-py_1.4.0_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/rouge-score_0.1.2_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/sentencepiece_0.1.99_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/accelerate_0.21.0_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/torch_2.3.1_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/tokenizers_0.19.1_package_for_python3.9/install/bin:/local/mnt/workspace/mmirkina/work_collection/mlperf_loadgen_package_for_python3.9/install/bin', 'HOME': '/local/mnt/workspace/mmirkina/', 'AXS_WORK_COLLECTION': '/local/mnt/workspace/mmirkina/work_collection', 'PYTHONPATH': '/local/mnt/workspace/mmirkina/work_collection/numpy_1.24.1_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/pybind11_2.10.4_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/pandas_2.2.2_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/nltk_3.8.1_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/evaluate_0.4.0_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/absl-py_1.4.0_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/rouge-score_0.1.2_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/sentencepiece_0.1.99_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/accelerate_0.21.0_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/torch_2.3.1_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/tokenizers_0.19.1_package_for_python3.9/install/lib/python3.9/site-packages:/local/mnt/workspace/mmirkina/work_collection/mlperf_loadgen_package_for_python3.9/install/lib/python3.9/site-packages'}, in_dir=/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5, capture_output=False, errorize_output=True, capture_stderr=False, split_to_lines=False):
        /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/" --accuracy --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/axs2mlperf/mixtral_reference_code/user.conf" --total-sample-count 15 --dataset-path "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/download_dataset_15_samples/mixtral_15.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5" --device "cuda" --dtype "float16"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
WARNING:Mixtral-8x7B-Instruct-v0.1-MAIN:Accuracy run will generate the accuracy logs, but the evaluation of the log is not completed yet
Loading dataset...
Finished loading dataset.
...
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 15 samples
IssueQuery done
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
Saving outputs to run_outputs/q9.pkl
Samples run: 1
        BatchMaker time: 0.0008153915405273438
        Inference time: 12.435491561889648
        Postprocess time: 0.0005464553833007812
        ==== Total time: 12.436853408813477
...
Samples run: 15
        BatchMaker time: 0.00018739700317382812
        Inference time: 47.541648864746094
        Postprocess time: 0.000560760498046875
        ==== Total time: 47.542397022247314

No warnings encountered during test.

No errors encountered during test.
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
...
    "_parent_entries": [
        [
            "^",
            "byname",
            "base_mixtral_loadgen_experiment"
        ]
    ],
    "with_power": null,
    "model_name": "mixtral_8x7b",
    "mlperf_model_name": "mixtral_8x7b",
    "model_path": "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/",
    "dataset_name": "mixtral",
    "sut_name": "aus655-apollo-0",
    "program_name": "mixtral_reference_code",
    "loadgen_buffer_size": 8,
    "loadgen_compliance_test": null,
    "device": "cuda",
    "dtype": "float16",
    "experiment_end_timestamp": "2024.07.07T09:30:59"
} saved to '/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5/data_axs.json'
INFO:root:Matched Rule #1/1 produced an entry, which matches the original query.

['^', 'byname', 'generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5']

real    4m27.111s
user    23m49.428s
sys     7m30.129s

Accuracy(without axs):

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --checkpoint-path /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/  --mlperf-accuracy-file /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_42ae0fb993ee4ec69990f25c7d6857a5/mlperf_log_accuracy.json --dataset-file /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/download_dataset_15_samples/mixtral_15.pkl --dtype int32
...
{'rouge1': 51.8093, 'rouge2': 23.1958, 'rougeL': 31.7219, 'rougeLsum': 48.2656, 'gsm8k': 80.0, 'mbxp': 20.0, 'gen_len': 4271, 'gen_num': 15, 'gen_tok_len': 4560, 'tokens_per_sample': 304.0}
maria-18-git commented 1 month ago

5. Run Accuracy(full run, without downloaded dataset, model by axs)

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline,total_sample_count=15000                                                                                                                                                                                                           
...
        /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/" --accuracy --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/axs2mlperf/mixtral_reference_code/user.conf" --total-sample-count 15000 --dataset-path "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_4b32ad3fb1d64c379f70fcbe244527a8" --device "cuda" --dtype "float16"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
WARNING:Mixtral-8x7B-Instruct-v0.1-MAIN:Accuracy run will generate the accuracy logs, but the evaluation of the log is not completed yet
Loading dataset...
...
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 15000 samples
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
IssueQuery done
Saving outputs to run_outputs/q4118.pkl
Samples run: 1
        BatchMaker time: 0.03139448165893555
        Inference time: 19.479421377182007
        Postprocess time: 0.0007452964782714844
        ==== Total time: 19.511561155319214
Saving outputs to run_outputs/q3770.pkl
Samples run: 2
        BatchMaker time: 0.0001633167266845703
        Inference time: 6.8499672412872314
        Postprocess time: 0.0005311965942382812
        ==== Total time: 6.850661754608154
Saving outputs to run_outputs/q12091.pkl
Samples run: 3
        BatchMaker time: 0.00019741058349609375
        Inference time: 2.904864549636841
        Postprocess time: 0.0002982616424560547
        ==== Total time: 2.905360221862793
...
    "__cumulative_param_names": [
        "__query",
        "task",
        "framework",
        "loadgen_mode",
        "loadgen_scenario",
        "total_sample_count",
        "tags"
    ],
    "loadgen_scenario": "Offline",
    "loadgen_mode": "AccuracyOnly",
    "total_sample_count": 15000,
    "task": "mixtral",
    "framework": "torch",
    "__query": "loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline,total_sample_count=15000",
    "_replay": [
        "^^",
        "execute",
        [
            [
                [
                    "get_kernel"
                ],
                [
                    "byname",
                    "mixtral_reference_code"
                ],
                [
                    "get"
                ]
            ]
        ]
    ],
    "_parent_entries": [
        [
            "^",
            "byname",
            "base_mixtral_loadgen_experiment"
        ]
    ],
    "with_power": null,
    "model_name": "mixtral_8x7b",
    "mlperf_model_name": "mixtral_8x7b",
    "model_path": "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/",
    "dataset_name": "mixtral",
    "sut_name": "aus655-apollo-0",
    "program_name": "mixtral_reference_code",
    "loadgen_dataset_size": 15000,
    "loadgen_buffer_size": 8,
    "loadgen_compliance_test": null,
    "dataset_path": "/local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl",
    "device": "cuda",
    "dtype": "float16",
    "experiment_end_timestamp": "2024.07.09T20:27:30"
} saved to '/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_4b32ad3fb1d64c379f70fcbe244527a8/data_axs.json'
INFO:root:Matched Rule #1/1 produced an entry, which matches the original query.

['^', 'byname', 'generated_by_mixtral_reference_code_on_get_4b32ad3fb1d64c379f70fcbe244527a8']

real    3528m25.489s
user    3548m5.997s
sys     9m49.413s
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ wc -l /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_4b32ad3fb1d64c379f70fcbe244527a8/mlperf_log_accuracy.json
15002 /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_4b32ad3fb1d64c379f70fcbe244527a8/mlperf_log_accuracy.json

Accuracy:

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --checkpoint-path /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/  --mlperf-accuracy-file /local/mnt/workspace/mmirkina/accuracy_temp/mlperf_log_accuracy_full_15000.json  --dataset-file /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl --dtype int32
...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [10:12<00:00,  8.17it/s]
Processed 5000 in 612.0956107849488s
 25.90% pass@1
{'ruby': 414, 'cpp': 387, 'php': 0, 'typescript': 0, 'python': 494, 'javascript': 0} {'ruby': 846, 'cpp': 743, 'php': 846, 'typescript': 868, 'python': 863, 'javascript': 834}

Results

{'rouge1': 44.9319, 'rouge2': 22.869, 'rougeL': 30.3357, 'rougeLsum': 42.0434, 'gsm8k': 74.04, 'mbxp': 25.9, 'gen_len': 4020445, 'gen_num': 15000, 'gen_tok_len': 4264758, 'tokens_per_sample': 284.3}

During full accuracy running get part of accuracy log for First 5000 samples:

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --checkpoint-path /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/  --mlperf-accuracy-file /local/mnt/workspace/mmirkina/accuracy_temp/mlperf_log_accuracy_first_5000.json  --dataset-file /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl --dtype int32
...
Results

{'rouge1': 44.8438, 'rouge2': 22.9349, 'rougeL': 30.0678, 'rougeLsum': 41.934, 'gsm8k': 75.25464349910126, 'mbxp': 26.89738919247116, 'gen_len': 1370417, 'gen_num': 5000, 'gen_tok_len': 1424716, 'tokens_per_sample': 284.9}

Second 5000:

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --checkpoint-path /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/  --mlperf-accuracy-file /local/mnt/workspace/mmirkina/accuracy_temp/mlperf_log_accuracy_second_5000.json  --dataset-file /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl --dtype int32
...
Results

{'rouge1': 45.3139, 'rouge2': 22.6935, 'rougeL': 30.6544, 'rougeLsum': 42.3755, 'gsm8k': 74.71333735666867, 'mbxp': 26.498237367802584, 'gen_len': 1313461, 'gen_num': 5000, 'gen_tok_len': 1417708, 'tokens_per_sample': 283.5}
maria-18-git commented 1 month ago

6. Short runs

After update in MLCommons inference repo(https://github.com/mlcommons/inference/pull/1754)

Short run(10 samples with full dataset):
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline,total_sample_count=10
...
} saved to '/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_5f197824137f42c9a0dbedee5e0e670f/data_axs.json'
INFO:root:Matched Rule #1/1 produced an entry, which matches the original query.

['^', 'byname', 'generated_by_mixtral_reference_code_on_get_5f197824137f42c9a0dbedee5e0e670f']

real    3m20.567s
user    20m38.746s
sys     8m16.688s

Accuracy script:

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --checkpoint-path /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/downloaded_model_checkpoint_270624/mixtral-8x7b-instruct-v0.1/  --mlperf-accuracy-file /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_5f197824137f42c9a0dbedee5e0e670f/mlperf_log_accuracy.json --dataset-file /local/mnt/workspace/mmirkina/mixtral_8x7b_reference/dataset/2024.06.06_mixtral_15k_v4.pkl --dtype int32
[nltk_data] Downloading package punkt to
[nltk_data]     /local/mnt/workspace/mmirkina/nltk_data...
[nltk_data]   Package punkt is already up-to-date!

Results

{'gsm8k': 70.0, 'mbxp': 0, 'gen_len': 0, 'gen_num': 10, 'gen_tok_len': 3104, 'tokens_per_sample': 310.4}
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Offline,total_sample_count=10 , get accuracy_dict
INFO:root:[base_loadgen_experiment] touch _BEFORE_CODE_LOADING=/local/mnt/workspace/mmirkina/work_collection/pint_package_for_python3.9/install/lib/python3.9/site-packages
{'gsm8k': 70.0, 'mbxp': 0, 'gen_len': 0, 'gen_num': 10, 'gen_tok_len': 3104, 'tokens_per_sample': 310.4}
maria-18-git commented 1 month ago

7. Performance run, Offline(short run)

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Offline,total_sample_
count=10,loadgen_query_count=10
...
       /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1"  --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_6606e4361e8f407ea4bbebac471330eb/user.conf" --total-sample-count 10 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_6606e4361e8f407ea4bbebac471330eb" --device "cuda:0" --dtype "float16"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:35<00:00,  1.10it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 660 samples
IssueQuery done
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
Saving outputs to run_outputs/q8.pkl
Samples run: 1
        BatchMaker time: 0.0007636547088623047
        Inference time: 40.94887709617615
        Postprocess time: 0.0011742115020751953
        ==== Total time: 40.950814962387085
Saving outputs to run_outputs/q5.pkl
Samples run: 2
        BatchMaker time: 0.00020742416381835938
        Inference time: 8.971089363098145
        Postprocess time: 0.0005135536193847656
        ==== Total time: 8.971810340881348
Saving outputs to run_outputs/q0.pkl
Samples run: 3
        BatchMaker time: 0.000186920166015625
        Inference time: 16.994237661361694
        Postprocess time: 0.0004570484161376953
        ==== Total time: 16.994881629943848
Saving outputs to run_outputs/q3.pkl
Samples run: 4
        BatchMaker time: 0.00015687942504882812
        Inference time: 14.726018190383911
        Postprocess time: 0.0003829002380371094
        ==== Total time: 14.726557970046997
...
Saving outputs to run_outputs/q5.pkl
Samples run: 10
        BatchMaker time: 0.00021958351135253906
        Inference time: 8.971242904663086
        Postprocess time: 0.00045800209045410156
        ==== Total time: 8.971920490264893
Saving outputs to run_outputs/q0.pkl
Samples run: 11
        BatchMaker time: 0.00015234947204589844
        Inference time: 17.012208938598633
        Postprocess time: 0.000461578369140625
        ==== Total time: 17.01282286643982
Saving outputs to run_outputs/q3.pkl
Samples run: 12
        BatchMaker time: 0.00013637542724609375
        Inference time: 14.714705467224121
        Postprocess time: 0.0005829334259033203
        ==== Total time: 14.71542477607727
Saving outputs to run_outputs/q6.pkl

user.conf:

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_6606e4361e8f407ea4bbebac471330eb/user.conf
mixtral-8x7b.Offline.min_query_count = 10
mixtral-8x7b.Offline.max_query_count = 10
mixtral-8x7b.Offline.performance_sample_count_override = 8
mixtral-8x7b.Offline.coalesce_queries = 1

total_sample_count=10 doesn't applied for Performance mode.

maria-18-git commented 1 month ago

Set performance_sample_count_override = total_sample_count

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Offline,total_sample_count=10,loadgen_min_query_count=10
...
        /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1"  --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_8cfacb20de2244288c366aa9c5ec492c/user.conf" --total-sample-count 10 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_8cfacb20de2244288c366aa9c5ec492c" --device "cuda:0" --dtype "float16"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:35<00:00,  1.09it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 660 samples
IssueQuery done
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
Saving outputs to run_outputs/q8.pkl
Samples run: 1
        BatchMaker time: 0.0005807876586914062
        Inference time: 41.58476805686951
        Postprocess time: 0.0008158683776855469
        ==== Total time: 41.586164712905884
Saving outputs to run_outputs/q5.pkl
Samples run: 2
        BatchMaker time: 0.00020503997802734375
        Inference time: 9.112966537475586
        Postprocess time: 0.0005428791046142578
        ==== Total time: 9.113714456558228
Saving outputs to run_outputs/q0.pkl
Samples run: 3
        BatchMaker time: 0.0002009868621826172
        Inference time: 17.251879692077637
        Postprocess time: 0.0003573894500732
...
Samples run: 10
        BatchMaker time: 0.00020194053649902344
        Inference time: 9.094660758972168
        Postprocess time: 0.00040793418884277344
        ==== Total time: 9.09527063369751
Saving outputs to run_outputs/q0.pkl
Samples run: 11
        BatchMaker time: 0.0001404285430908203
        Inference time: 17.270347833633423
        Postprocess time: 0.00042557716369628906
        ==== Total time: 17.27091383934021
Saving outputs to run_outputs/q3.pkl
Samples run: 12
        BatchMaker time: 0.00013399124145507812
        Inference time: 14.922792911529541
        Postprocess time: 0.0005435943603515625
        ==== Total time: 14.923470497131348
Saving outputs to run_outputs/q6.pkl
...
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Offline,total_sample_count=10,loadgen_min_query_count=10 , get_path
INFO:root:[base_loadgen_experiment] touch _BEFORE_CODE_LOADING=/local/mnt/workspace/mmirkina/work_collection/pint_package_for_python3.9/install/lib/python3.9/site-packages
/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_8cfacb20de2244288c366aa9c5ec492c
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_8cfacb20de2244288c366aa9c5ec492c/user.conf
mixtral-8x7b.Offline.min_query_count = 10
mixtral-8x7b.Offline.performance_sample_count_override = 8
mixtral-8x7b.Offline.coalesce_queries = 1

Also IssueQuery started with 660 samples

if

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_644b0ae5227941f0b8c678e658b5069e/user.conf
mixtral-8x7b.Offline.min_query_count = 10
mixtral-8x7b.Offline.performance_sample_count_override = 10
mixtral-8x7b.Offline.coalesce_queries = 1

Then

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Offline,total_sample_count
=10,loadgen_min_query_count=10
...
        /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1"  --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_644b0ae5227941f0b8c678e658b5069e/user.conf" --total-sample-count 10 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_644b0ae5227941f0b8c678e658b5069e" --device "cuda:0" --dtype "float16"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:36<00:00,  1.08it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 660 samples
IssueQuery done
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
Saving outputs to run_outputs/q8.pkl
Samples run: 1
        BatchMaker time: 0.0006940364837646484
        Inference time: 41.28359842300415
        Postprocess time: 0.0008296966552734375
        ==== Total time: 41.28512215614319
Saving outputs to run_outputs/q5.pkl
Samples run: 2
        BatchMaker time: 0.00020360946655273438
        Inference time: 9.022686243057251
        Postprocess time: 0.00037741661071777344
        ==== Total time: 9.023267269134521
...
Samples run: 10
        BatchMaker time: 0.0001964569091796875
        Inference time: 8.056349992752075
        Postprocess time: 0.0004391670227050781
        ==== Total time: 8.05698561668396
Saving outputs to run_outputs/q8.pkl
Samples run: 11
        BatchMaker time: 0.00014781951904296875
        Inference time: 40.30126667022705
        Postprocess time: 0.0006682872772216797
        ==== Total time: 40.302082777023315
..
maria-18-git commented 1 month ago

If set

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_9ab6243dfed440b1b67169ccbac1d51b/user.conf
mixtral-8x7b.Offline.min_query_count = 5
mixtral-8x7b.Offline.performance_sample_count_override = 5
mixtral-8x7b.Offline.coalesce_queries = 1

Then

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Offline,total_sample_count=5,loadgen_min_query_count=5
... 
       /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Offline" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1"  --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_9ab6243dfed440b1b67169ccbac1d51b/user.conf" --total-sample-count 5 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_9ab6243dfed440b1b67169ccbac1d51b" --device "cuda:0" --dtype "float16"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:33<00:00,  1.16it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
IssueQuery started with 660 samples
IssueQuery done
...
Samples run: 1
        BatchMaker time: 0.0006489753723144531
        Inference time: 9.025307893753052
        Postprocess time: 0.0006871223449707031
        ==== Total time: 9.026643991470337
Saving outputs to run_outputs/q3.pkl
Samples run: 2
        BatchMaker time: 0.0001709461212158203
        Inference time: 14.81588864326477
        Postprocess time: 0.0004661083221435547
        ==== Total time: 14.81652569770813
...
Samples run: 5
        BatchMaker time: 0.0001342296600341797
        Inference time: 19.760494232177734
        Postprocess time: 0.0005040168762207031
        ==== Total time: 19.76113247871399
Saving outputs to run_outputs/q4.pkl
Samples run: 6
        BatchMaker time: 0.0002002716064453125
        Inference time: 8.046847820281982
        Postprocess time: 0.0005133152008056641
        ==== Total time: 8.047561407089233
Saving outputs to run_outputs/q3.pkl
Samples run: 7
        BatchMaker time: 0.000148773193359375
        Inference time: 14.783483743667603
        Postprocess time: 0.00044274330139160156
        ==== Total time: 14.784075260162354
Saving outputs to run_outputs/q0.pkl
...

So we can't short run for Offline, Performance.

maria-18-git commented 1 month ago

8. Accuracy run for Server scenario

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf/mixtral_reference_code$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Server,total_sample_count=15
...
        /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --accuracy --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_75cfd9afe3aa45d791f4f9eb1a9d4b2d/user.conf" --total-sample-count 15 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_75cfd9afe3aa45d791f4f9eb1a9d4b2d" --device "cuda:0" --dtype "float16" --batch-size 1
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Traceback (most recent call last):
  File "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py", line 166, in <module>
    main()
  File "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py", line 133, in main
    sut = sut_cls(
TypeError: __init__() got an unexpected keyword argument 'batch_size'
INFO:root:Matched Rule #1/1 produced an entry, which matches the original query.

['^', 'byname', 'generated_by_mixtral_reference_code_on_get_75cfd9afe3aa45d791f4f9eb1a9d4b2d']

real    0m2.719s
user    0m4.947s
sys     0m11.550s

We need to fix this issue in https://github.com/mlcommons/inference/blob/master/language/mixtral-8x7b/main.py#L105 In this case sut_cls = SUTServer https://github.com/mlcommons/inference/blob/master/language/mixtral-8x7b/main.py#L131 This code works https://github.com/mlcommons/inference/blob/master/language/mixtral-8x7b/SUT.py#L347 But we don't have batch_size as input parameter https://github.com/mlcommons/inference/blob/master/language/mixtral-8x7b/main.py#L136 Add fix:

mirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b$ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   SUT.py
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b$ git diff
diff --git a/language/mixtral-8x7b/SUT.py b/language/mixtral-8x7b/SUT.py
index b4a2dc7..724d7b8 100644
--- a/language/mixtral-8x7b/SUT.py
+++ b/language/mixtral-8x7b/SUT.py
@@ -345,13 +345,14 @@ class SUT():

 class SUTServer(SUT):
-    def __init__(self, model_path=None, dtype="bfloat16", device="cpu",
+    def __init__(self, model_path=None, dtype="bfloat16", device="cpu", batch_size=1,
                  total_sample_count=24576, dataset_path=None, workers=1):

         super().__init__(
             model_path=model_path,
             dtype=dtype,
             device=device,
+            batch_size=batch_size,
             total_sample_count=total_sample_count,
             dataset_path=dataset_path,
             workers=workers)

then Accuracy, short run(15 samples):


mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Server,total_sample_count=15
...
      /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" --accuracy --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/
mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_858b72e06ce1404abbcc3954b414b550/user.conf" --total-sample-count 15 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_20
24.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_858b72e06ce1404abbcc3954b414b550" --device "c
uda:0" --dtype "float16" --batch-size 1
...
} saved to '/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_858b72e06ce1404abbcc3954b414b550/data_axs.json'
INFO:root:Matched Rule #1/1 produced an entry, which matches the original query.

['^', 'byname', 'generated_by_mixtral_reference_code_on_get_858b72e06ce1404abbcc3954b414b550']

real    5m9.109s
user    18m19.012s
sys     9m23.009s
maria-18-git commented 1 month ago

Accuracy:

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$  axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=AccuracyOnly,loadgen_scenario=Server,total_sample_count=15 , get accuracy_dict
...
usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/evaluate-accuracy.py --mlperf-accuracy-file "/local/mnt/workspace/mmirkina/work_collection
/generated_by_mixtral_reference_code_on_get_858b72e06ce1404abbcc3954b414b550/mlperf_log_accuracy.json" --checkpoint-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtr
al-8x7b-instruct-v0.1" --dataset-file "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --dtype "int32"
...
INFO:root:[base_loadgen_experiment] touch _BEFORE_CODE_LOADING=/local/mnt/workspace/mmirkina/work_collection/pint_package_for_python3.9/install/lib/python3.9/site-packages
{'gsm8k': 80.0, 'mbxp': 0, 'gen_len': 0, 'gen_num': 15, 'gen_tok_len': 2701, 'tokens_per_sample': 180.1}
maria-18-git commented 1 month ago

9. Performance run for Server scenario

- short run

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Server,total_sample_count=15,loadgen_query_count=15
...
        /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1"  --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_725f03da97054c5c9a7534b7e21b98f6/user.conf" --total-sample-count 15 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_725f03da97054c5c9a7534b7e21b98f6" --device "cuda:0" --d
type "float16" --batch-size 1
...
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:35<00:00,  1.11it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to
1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Server
Mode     : PerformanceOnly
Completed samples per second    : 0.05
Completed tokens per second: 10.15
Result is : INVALID
  Performance constraints satisfied : NO
  Min duration satisfied : NO
  Min queries satisfied : Yes
  Early stopping satisfied: NO
Recommendations:
 * TTFT constrain not met: Reduce target QPS to improve latency.
 * Increase the target QPS so the loadgen pre-generates more queries.
TTFT Early Stopping Result:

TPOT Early Stopping Result:
 * Run unsuccessful.
 * Processed 15 queries.
 * Would need to run at least 444 more queries,
 with the run being successful if every additional
 query were under latency.

================================================
Additional Stats
================================================
Scheduled samples per second : 0.82
Min latency (ns)                : 13064653379
Max latency (ns)                : 267793204858
Mean latency (ns)               : 143221598769
50.00 percentile latency (ns)   : 150929072160
90.00 percentile latency (ns)   : 254726221440
95.00 percentile latency (ns)   : 267793204858
97.00 percentile latency (ns)   : 267793204858
99.00 percentile latency (ns)   : 267793204858
99.90 percentile latency (ns)   : 267793204858

Completed tokens per second                 : 10.15
Min First Token latency (ns)                : 1009652297
Max First Token latency (ns)                : 251018039938
Mean First Token latency (ns)               : 124472043771
50.00 percentile first token latency (ns)   : 141028154321
90.00 percentile first token latency (ns)   : 235301181383
95.00 percentile first token latency (ns)   : 251018039938
97.00 percentile first token latency (ns)   : 251018039938
99.00 percentile first token latency (ns)   : 251018039938
99.90 percentile first token latency (ns)   : 251018039938

Min Time to Output Token (ns)                : 97460180
Max Time to Output Token (ns)                : 98811484
Mean Time to Output Token (ns)               : 97999451
50.00 percentile time to output token (ns)   : 97908450
90.00 percentile time to output token (ns)   : 98751642
95.00 percentile time to output token (ns)   : 98811484
97.00 percentile time to output token (ns)   : 98811484
99.00 percentile time to output token (ns)   : 98811484
99.90 percentile time to output token (ns)   : 98811484

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1
ttft_latency (ns): 2000000000
tpot_latency (ns): 200000000
max_async_queries : 0
min_duration (ms): 600000
max_duration (ms): 0
min_query_count : 15
max_query_count : 15
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 15

No warnings encountered during test.

1 ERROR encountered. See detailed log.
INFO:Mixtral-8x7B-Instruct-v0.1:Exiting First token response thread
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
INFO:root:[generated_by_mixtral_reference_code_on_get_725f03da97054c5c9a7534b7e21b98f6] parameters {
...
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_725f03da97054c5c9a7534b7e21b98f6']

real    5m28.203s
user    18m20.823s
sys     9m5.310s
maria-18-git commented 1 month ago
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Server,total_sample_c
ount=662,loadgen_target_qps=0.05
       /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1"  --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/w
ork_collection/generated_by_mixtral_reference_code_on_get_c1ddb273906f457abe0630b98b395a8e/user.conf" --total-sample-count 662 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_
mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_c1ddb273906f457abe0630b98b395a8e" --device "cuda:0" --
dtype "float16" --batch-size 1
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:34<00:00,  1.12it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to
1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Server
Mode     : PerformanceOnly
Completed samples per second    : 0.05
Completed tokens per second: 6.23
Result is : INVALID
  Performance constraints satisfied : NO
  Min duration satisfied : Yes
  Min queries satisfied : Yes
  Early stopping satisfied: NO
Recommendations:
 * TTFT constrain not met: Reduce target QPS to improve latency.
TTFT Early Stopping Result:

TPOT Early Stopping Result:
 * Run unsuccessful.
 * Processed 100 queries.
 * Would need to run at least 359 more queries,
 with the run being successful if every additional
 query were under latency.

================================================
Additional Stats
================================================
Scheduled samples per second : 0.05
Min latency (ns)                : 5830023019
Max latency (ns)                : 68595234786
Mean latency (ns)               : 23372322742                                                                                                                                                                      50.00 percentile latency (ns)   : 18108096926
90.00 percentile latency (ns)   : 52467416019
95.00 percentile latency (ns)   : 60556450072
97.00 percentile latency (ns)   : 64991970195
99.00 percentile latency (ns)   : 68595234786
99.90 percentile latency (ns)   : 68595234786

Completed tokens per second                 : 6.23
Min First Token latency (ns)                : 185737515
Max First Token latency (ns)                : 60490290109
Mean First Token latency (ns)               : 10809999725
50.00 percentile first token latency (ns)   : 4415499799
90.00 percentile first token latency (ns)   : 36296355041
95.00 percentile first token latency (ns)   : 42008621796
97.00 percentile first token latency (ns)   : 50461911783
99.00 percentile first token latency (ns)   : 60490290109
99.90 percentile first token latency (ns)   : 60490290109

Min Time to Output Token (ns)                : 97666897
Max Time to Output Token (ns)                : 99740760
Mean Time to Output Token (ns)               : 98312017
50.00 percentile time to output token (ns)   : 98235140
90.00 percentile time to output token (ns)   : 98852169
95.00 percentile time to output token (ns)   : 98921387
97.00 percentile time to output token (ns)   : 98990886
99.00 percentile time to output token (ns)   : 99740760
99.90 percentile time to output token (ns)   : 99740760

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 0.05
ttft_latency (ns): 2000000000
tpot_latency (ns): 200000000
max_async_queries : 0
min_duration (ms): 600000
max_duration (ms): 0
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 662

No warnings encountered during test.

No errors encountered during test.
INFO:Mixtral-8x7B-Instruct-v0.1:Exiting First token response thread
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_c1ddb273906f457abe0630b98b395a8e']

real    35m14.624s
user    34m59.382s
sys     9m17.484s
maria-18-git commented 1 month ago

If set in mlperf.conf mixtral-8x7b.Server.ttft_latency = 2000 -> mixtral-8x7b.Server.ttft_latency = 65000 mixtral-8x7b.Server.tpot_latency = 200 -> mixtral-8x7b.Server.tpot_latency = 150

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Server,total_sample_count=662,loadgen_target_qps=0.05,loadgen_ttft_latency=65000,loadgen_tpot_latency=150
...
        /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1"  --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/w
ork_collection/generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd/user.conf" --total-sample-count 662 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_
mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd" --device "cuda:0" --
dtype "float16" --batch-size 1
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards:  31%|█████████████████████████████████████████████▏                                                                                                     | 12/39 [00:10<00:24,  1.09it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:34<00:00,  1.14it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to
1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Server
Mode     : PerformanceOnly
Completed samples per second    : 0.05
Completed tokens per second: 6.23
Result is : INVALID
  Performance constraints satisfied : NO
  Min duration satisfied : Yes
  Min queries satisfied : Yes
  Early stopping satisfied: NO
Recommendations:
 * TTFT constrain not met: Reduce target QPS to improve latency.
TTFT Early Stopping Result:

TPOT Early Stopping Result:
 * Run unsuccessful.
 * Processed 100 queries.
 * Would need to run at least 359 more queries,
 with the run being successful if every additional
 query were under latency.

================================================
Additional Stats
================================================
Scheduled samples per second : 0.05
Min latency (ns)                : 5718265818
Max latency (ns)                : 66419296347
Mean latency (ns)               : 22677037097
50.00 percentile latency (ns)   : 17471114779
90.00 percentile latency (ns)   : 50096406261
95.00 percentile latency (ns)   : 58467919575
97.00 percentile latency (ns)   : 62976406531
99.00 percentile latency (ns)   : 66419296347                                                                                                                                                                      99.90 percentile latency (ns)   : 66419296347
Completed tokens per second                 : 6.23
Min First Token latency (ns)                : 185477181
Max First Token latency (ns)                : 58474430840
Mean First Token latency (ns)               : 10341132750
50.00 percentile first token latency (ns)   : 4303981659
90.00 percentile first token latency (ns)   : 34374459816
95.00 percentile first token latency (ns)   : 40702664334
97.00 percentile first token latency (ns)   : 48617643560
99.00 percentile first token latency (ns)   : 58474430840
99.90 percentile first token latency (ns)   : 58474430840

Min Time to Output Token (ns)                : 95681850
Max Time to Output Token (ns)                : 97959003
Mean Time to Output Token (ns)               : 96554188
50.00 percentile time to output token (ns)   : 96587430
90.00 percentile time to output token (ns)   : 97079977
95.00 percentile time to output token (ns)   : 97193354
97.00 percentile time to output token (ns)   : 97411686
99.00 percentile time to output token (ns)   : 97959003
99.90 percentile time to output token (ns)   : 97959003

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 0.05
ttft_latency (ns): 2000000000
tpot_latency (ns): 200000000
max_async_queries : 0
min_duration (ms): 600000
max_duration (ms): 0
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 662

No warnings encountered during test.

No errors encountered during test.
INFO:Mixtral-8x7B-Instruct-v0.1:Exiting First token response thread
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
...
['^', 'byname', 'generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd']

real    35m13.781s
user    34m55.483s
sys     8m59.240s
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd/user.conf
mixtral-8x7b.Server.performance_sample_count_override = 662
mixtral-8x7b.Server.target_qps = 0.05
mixtral-8x7b.Server.coalesce_queries = 1
mixtral-8x7b.Server.ttft_latency = 65000
mixtral-8x7b.Server.tpot_latency = 150
maria-18-git commented 1 month ago

Changed in mlperf.conf mixtral-8x7b.Server.ttft_latency = 62000

Now в mlperf.conf: mixtral-8x7b.Server.target_latency = 0 mixtral-8x7b.Server.ttft_latency = 62000 mixtral-8x7b.Server.tpot_latency = 200

Then

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=mixtral,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Server,loadgen_min_query_count=662,loadgen_target_qps=0.05
...
        /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1"  --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_81ae6357d3f24c52a759f909b27a0545/user.conf" --total-sample-count 15000 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_81ae6357d3f24c52a759f909b27a0545" --device "cuda:0" --dtype "float16" --batch-size 1
...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards:  31%|█████████████████████████████████████████████▏                                                                                                     | 12/39 [00:10<00:24,  1.09it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:34<00:00,  1.14it/s]
Loaded model
Loaded tokenizer                                                                                                                                                                                                   INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to
1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Server
Mode     : PerformanceOnly
Completed samples per second    : 0.05
Completed tokens per second: 6.55
Result is : INVALID
  Performance constraints satisfied : NO
  Min duration satisfied : Yes
  Min queries satisfied : Yes
  Early stopping satisfied: NO
Recommendations:
 * TTFT constrain not met: Reduce target QPS to improve latency.
TTFT Early Stopping Result:
 * Run unsuccessful.
 * Processed 662 queries.
 * Would need to run at least 7046 more queries,
 with the run being successful if every additional
 query were under latency.
TPOT Early Stopping Result:
 * Run successful.

================================================
Additional Stats
================================================
Scheduled samples per second : 0.05
Min latency (ns)                : 2486717028
Max latency (ns)                : 159035247554
Mean latency (ns)               : 33772426088
50.00 percentile latency (ns)   : 23502796314
90.00 percentile latency (ns)   : 75285437852
95.00 percentile latency (ns)   : 118225505597
97.00 percentile latency (ns)   : 128366769948
99.00 percentile latency (ns)   : 151858693459
99.90 percentile latency (ns)   : 159035247554

Completed tokens per second                 : 6.55
Min First Token latency (ns)                : 140220154
Max First Token latency (ns)                : 150555503626
Mean First Token latency (ns)               : 19762347996
50.00 percentile first token latency (ns)   : 7281915958
90.00 percentile first token latency (ns)   : 56872547290
95.00 percentile first token latency (ns)   : 101315290042
97.00 percentile first token latency (ns)   : 115607824512
99.00 percentile first token latency (ns)   : 132908847055
99.90 percentile first token latency (ns)   : 150555503626

Min Time to Output Token (ns)                : 97579488
Max Time to Output Token (ns)                : 102603516
Mean Time to Output Token (ns)               : 98661860
50.00 percentile time to output token (ns)   : 98459173
90.00 percentile time to output token (ns)   : 99694911
95.00 percentile time to output token (ns)   : 100262397
97.00 percentile time to output token (ns)   : 100780724
99.00 percentile time to output token (ns)   : 101512282
99.90 percentile time to output token (ns)   : 102603516

================================================
Test Parameters Used
samples_per_query : 1
target_qps : 0.05
ttft_latency (ns): 62000000000
tpot_latency (ns): 200000000
max_async_queries : 0
min_duration (ms): 600000
max_duration (ms): 0
min_query_count : 662
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 15000

No warnings encountered during test.

No errors encountered during test.

No errors encountered during test.
INFO:Mixtral-8x7B-Instruct-v0.1:Exiting First token response thread
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
...

['^', 'byname', 'generated_by_mixtral_reference_code_on_get_81ae6357d3f24c52a759f909b27a0545']

real    242m29.181s
user    170m21.273s
sys     9m32.119s

But

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_81ae6357d3f24c52a759f909b27a0545/user.conf
mixtral-8x7b.Server.min_query_count = 662
mixtral-8x7b.Server.performance_sample_count_override = 15000
mixtral-8x7b.Server.target_qps = 0.05
mixtral-8x7b.Server.coalesce_queries = 1
maria-18-git commented 1 month ago

If set:

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf | grep mixtral
mixtral-8x7B.*.sample_concatenate_permutation = 1
mixtral-8x7b.*.use_token_latencies = 1
# Only ttft and tpot are tracked for the llama2-70b & mixtral-8x7B benchmark therefore target_latency = 0
mixtral-8x7b.Server.target_latency = 0
mixtral-8x7b.Server.ttft_latency = 62000
mixtral-8x7b.Server.tpot_latency = 200
mixtral-8x7b.Offline.min_query_count = 15000
        /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1"  --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/w
ork_collection/generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd/user.conf" --total-sample-count 662 --dataset-path "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_
mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_mixtral_reference_code_on_get_74ac3f511d91479a96af742cda7728dd" --device "cuda:0" --
dtype "float16" --batch-size 1
maria-18-git commented 1 month ago

mlperf.conf copied to created entry.

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery loadgen_output,task=moe,framework=torch,loadgen_mode=PerformanceOnly,loadgen_scenario=Server,loadgen_min_query_count=1000,loadgen_target_qps=0.05
...
       /usr/bin/python3 /local/mnt/workspace/mmirkina/work_collection/mlperf_inference_git_master/language/mixtral-8x7b/main.py --scenario "Server" --model-path "/local/mnt/workspace/mmirkina/work_collection/do
wnloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1"  --mlperf-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_moe_reference_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52db
d/mlperf.conf" --user-conf "/local/mnt/workspace/mmirkina/work_collection/generated_by_moe_reference_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52dbd/user.conf" --total-sample-count 15000 --dataset-pa
th "/local/mnt/workspace/mmirkina/work_collection/downloaded_2024.06.06_mixtral_15k_v4.pkl/2024.06.06_mixtral_15k_v4.pkl" --output-log-dir "/local/mnt/workspace/mmirkina/work_collection/generated_by_moe_referenc
e_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52dbd" --device "cuda:0" --dtype "float16" --batch-size 1
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading dataset...
Finished loading dataset.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:35<00:00,  1.09it/s]
Loaded model
Loaded tokenizer
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Starting Benchmark run
/local/mnt/workspace/mmirkina/work_collection/transformers_4.41.2_package_for_python3.9/install/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:563: UserWarning: `num_beams` is set to
1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Server
Mode     : PerformanceOnly
Completed samples per second    : 0.05
Completed tokens per second: 6.87
Result is : INVALID
  Performance constraints satisfied : NO
  Min duration satisfied : Yes
  Min queries satisfied : Yes
  Early stopping satisfied: NO
Recommendations:
 * TTFT constrain not met: Reduce target QPS to improve latency.
TTFT Early Stopping Result:
 * Run unsuccessful.
 * Processed 1000 queries.
 * Would need to run at least 12684 more queries,
 with the run being successful if every additional
 query were under latency.
TPOT Early Stopping Result:
 * Run successful.
================================================
Additional Stats
================================================
Scheduled samples per second : 0.05
Min latency (ns)                : 2254937408
Max latency (ns)                : 203136802769
Mean latency (ns)               : 36474350157
50.00 percentile latency (ns)   : 24622631339                                                                                                                                                                      90.00 percentile latency (ns)   : 88052729073
95.00 percentile latency (ns)   : 114536578525
97.00 percentile latency (ns)   : 132427403215
99.00 percentile latency (ns)   : 169032020175
99.90 percentile latency (ns)   : 203136802769

Completed tokens per second                 : 6.87
Min First Token latency (ns)                : 138495953
Max First Token latency (ns)                : 186722721161
Mean First Token latency (ns)               : 22513031640
50.00 percentile first token latency (ns)   : 8509059644
90.00 percentile first token latency (ns)   : 68997991288
95.00 percentile first token latency (ns)   : 99699060394
97.00 percentile first token latency (ns)   : 115203356595
99.00 percentile first token latency (ns)   : 156668903970
99.90 percentile first token latency (ns)   : 186722721161

Min Time to Output Token (ns)                : 95872056
Max Time to Output Token (ns)                : 100753945
Mean Time to Output Token (ns)               : 97079862
50.00 percentile time to output token (ns)   : 96877641
90.00 percentile time to output token (ns)   : 98147562
95.00 percentile time to output token (ns)   : 98809447
97.00 percentile time to output token (ns)   : 99386795
99.00 percentile time to output token (ns)   : 100146259
99.90 percentile time to output token (ns)   : 100753945

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 0.05
ttft_latency (ns): 62000000000
tpot_latency (ns): 200000000
max_async_queries : 0
min_duration (ms): 600000
max_duration (ms): 0
min_query_count : 1000
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 15000

No warnings encountered during test.

No errors encountered during test.
INFO:Mixtral-8x7B-Instruct-v0.1:Exiting First token response thread
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Run Completed!
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying SUT...
INFO:Mixtral-8x7B-Instruct-v0.1-MAIN:Destroying QSL...
...
['^', 'byname', 'generated_by_moe_reference_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52dbd']

real    353m21.616s
user    249m30.290s
sys     9m41.276s

mlperf.conf

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_moe_reference_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52dbd/mlperf.conf | grep mixtral
mixtral-8x7B.*.sample_concatenate_permutation = 1
mixtral-8x7b.*.use_token_latencies = 1
# Only ttft and tpot are tracked for the llama2-70b & mixtral-8x7B benchmark therefore target_latency = 0
mixtral-8x7b.Server.target_latency = 0
mixtral-8x7b.Server.ttft_latency = 62000
mixtral-8x7b.Server.tpot_latency = 200
mixtral-8x7b.Offline.min_query_count = 15000

user.conf:

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ cat /local/mnt/workspace/mmirkina/work_collection/generated_by_moe_reference_using_torch_loadgen_on_get_ce145a450ec44106bc5d4c43a8d52dbd/user.conf
mixtral-8x7b.Server.min_query_count = 1000
mixtral-8x7b.Server.performance_sample_count_override = 15000
mixtral-8x7b.Server.target_qps = 0.05
mixtral-8x7b.Server.coalesce_queries = 1
maria-18-git commented 1 month ago

Useful updates:

maria-18-git commented 1 month ago

Recipe for downloading checkpoint model with patch.

  1. Create patch:
    • Create directories with files and emtpy:
      maria@chai ~/work_collection/axs2mlperf (mixtral-dev *=)$ ls -la tokenizer_dir/
      total 2252
      drwxr-xr-x  2 maria krai    4096 Jul 17 16:18 .
      drwxr-xr-x 55 maria krai    4096 Jul 17 16:31 ..
      -rw-r--r--  1 maria krai    2103 Jul 17 16:17 tokenizer_config.json
      -rw-r--r--  1 maria krai 1795188 Jul 17 16:17 tokenizer.json
      -rw-r--r--  1 maria krai  493443 Jul 17 16:17 tokenizer.model
      maria@chai ~/work_collection/axs2mlperf (mixtral-dev *=)$ ls -la empty_dir/
      total 8
      drwxr-xr-x  2 maria krai 4096 Jul 17 16:17 .
      drwxr-xr-x 55 maria krai 4096 Jul 17 16:31 ..
    • go to empty directory
      maria@chai ~/work_collection/axs2mlperf (mixtral-dev *=)$ cd empty_dir/
    • Run diff for patch creation. It is important to add a option for comparing binary file
      maria@chai ~/work_collection/axs2mlperf/empty_dir (mixtral-dev *=)$ diff -ruNa . ../tokenizer_dir > ../model_mixtral_checkpoint_recipe/tokenizer.patch
maria-18-git commented 1 month ago
maria-18-git commented 1 month ago
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ axs byquery downloaded,pytorch_model,model_name=mixtral-8x7b
...
        "/usr/bin/rclone" copy mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1 "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" -P
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Transferred:      173.982 GiB / 173.982 GiB, 100%, 30.664 MiB/s, ETA 0s
Transferred:           42 / 42, 100%
Elapsed time:      36m6.9s
WARNING:root:The resolved patch_tool_entry 'patch_tool' located at '/local/mnt/workspace/mmirkina/work_collection/patch_tool' uses the shell tool '/usr/bin/patch'
WARNING:root:shell.run() about to execute (with env=None, in_dir=None, capture_output=False, errorize_output=False, capture_stderr=False, split_to_lines=False):
        "/usr/bin/patch" -p1 -d "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1" -i "/local/mnt/workspace/mmirkina/work_collection/axs2mlperf/model_pytorch_mixtral_recipe/tokenizer.patch"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
patching file tokenizer_config.json
patching file tokenizer.json
patching file tokenizer.model
INFO:root:Matched Rule #1/2 produced an entry, which matches the original query.

['^', 'byname', 'downloaded_mixtral-8x7b-instruct-v0.1']
maria-18-git commented 1 month ago

Then

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ ls -la /local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1
total 182435448
drwxr-xr-x 2 mmirkina users       4096 Jul 18 07:55 .
drwxr-xr-x 3 mmirkina users       4096 Jul 18 07:19 ..
-rw-r--r-- 1 mmirkina users        803 Jun 24 17:04 config.json
-rw-r--r-- 1 mmirkina users        111 Jun 24 17:04 generation_config.json
-rw-r--r-- 1 mmirkina users 4920052720 Jun 24 17:04 model-00001-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00002-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00003-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00004-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00005-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504264 Jun 24 17:05 model-00006-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559912 Jun 24 17:05 model-00007-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00008-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00009-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00010-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00011-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646240 Jun 24 17:06 model-00012-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:06 model-00013-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00014-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00015-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00016-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00017-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00018-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:08 model-00019-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00020-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00021-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00022-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00023-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00024-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:09 model-00025-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00026-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00027-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00028-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00029-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00030-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:11 model-00031-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00032-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00033-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00034-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00035-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00036-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646264 Jun 24 17:12 model-00037-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:13 model-00038-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 1463862216 Jun 24 17:13 model-00039-of-00039.safetensors
-rw-r--r-- 1 mmirkina users      92659 Jun 24 17:13 model.safetensors.index.json
-rw-r--r-- 1 mmirkina users       2103 Jul 18 07:55 tokenizer_config.json
-rw-r--r-- 1 mmirkina users    1795188 Jul 18 07:55 tokenizer.json
-rw-r--r-- 1 mmirkina users     493443 Jul 18 07:55 tokenizer.model
maria-18-git commented 1 month ago

mixeval package for accuracy calculation mixeval can't install as python dependency in program during running an experiment.

"mxeval_query": ["python_package", "package_name=mxeval", "installable=git+https://github.com/amazon-science/mxeval.git@e09974f990eeaf0c0e8f2b5eaff4be66effb2c86" ],

We have this issue:

ERROR: For req: mxeval==1.0. Invalid script entry point: <ExportEntry evaluate_functional_correctness

Also we have the same issue when we run it locally

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/mxeval_git$ python3 -m pip uninstall mxeval
Found existing installation: mxeval 1.0
Uninstalling mxeval-1.0:
  Would remove:
    /local/mnt/workspace/mmirkina/.local/bin/evaluate_functional_correctness
    /local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages/mxeval.egg-link
Proceed (Y/n)? Y
  Successfully uninstalled mxeval-1.0
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/mxeval_git$ python3 -m pip show mxeval
WARNING: Package(s) not found: mxeval
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/mxeval_git$ python3 -m pip install git+https://github.com/amazon-science/mxeval.git@e09974f990eeaf0c0e8f2b5eaff4be66effb2c86
Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/amazon-science/mxeval.git@e09974f990eeaf0c0e8f2b5eaff4be66effb2c86
  Cloning https://github.com/amazon-science/mxeval.git (to revision e09974f990eeaf0c0e8f2b5eaff4be66effb2c86) to /tmp/pip-req-build-eppd08hl
  Running command git clone --filter=blob:none --quiet https://github.com/amazon-science/mxeval.git /tmp/pip-req-build-eppd08hl
  Running command git rev-parse -q --verify 'sha^e09974f990eeaf0c0e8f2b5eaff4be66effb2c86'
  Running command git fetch -q https://github.com/amazon-science/mxeval.git e09974f990eeaf0c0e8f2b5eaff4be66effb2c86
  Resolved https://github.com/amazon-science/mxeval.git to commit e09974f990eeaf0c0e8f2b5eaff4be66effb2c86
  Preparing metadata (setup.py) ... done
Requirement already satisfied: fire in /local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages (from mxeval==1.0) (0.6.0)
Requirement already satisfied: numpy in /local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages (from mxeval==1.0) (1.24.1)
Requirement already satisfied: tqdm in /local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages (from mxeval==1.0) (4.66.4)
Requirement already satisfied: termcolor in /local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages (from fire->mxeval==1.0) (2.4.0)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from fire->mxeval==1.0) (1.16.0)
Building wheels for collected packages: mxeval
  Building wheel for mxeval (setup.py) ... done
  Created wheel for mxeval: filename=mxeval-1.0-py3-none-any.whl size=14797 sha256=28ea70cfd9686f1474eaf3d24eaae1980e8f1219d81fb2ed03662976606fd4d2
  Stored in directory: /local/mnt/workspace/mmirkina/.cache/pip/wheels/6a/40/82/769569691d13c70cf87822cb923c87c2a856382763754b47bb
Successfully built mxeval
Installing collected packages: mxeval
ERROR: For req: mxeval==1.0. Invalid script entry point: <ExportEntry evaluate_functional_correctness = mxeval.evaluate_functional_correctness:None []> - A callable suffix is required. Cf https://packaging.python.org/specifications/entry-points/#use-for-scripts for more information.

But it is installed

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/mxeval_git$ python3 -m pip show mxeval
Name: mxeval
Version: 1.0
Summary: UNKNOWN
Home-page: UNKNOWN
Author: AWS AI Labs
Author-email:
License: UNKNOWN
Location: /local/mnt/workspace/mmirkina/.local/lib/python3.9/site-packages
Requires: fire, numpy, tqdm
Required-by:

Solution: Changed link installable=git+https://github.com/amazon-science/mxeval.git@e09974f990eeaf0c0e8f2b5eaff4be66effb2c86 to installable=git+https://github.com/shubhamugare/mxeval.git Commit: Added python packages for accuracy calculation for mixtral

maria-18-git commented 1 month ago

All python packages for accuracy calculation Commit: Added packages for moe(mixtral) accuracy calculation

maria-18-git commented 1 month ago

According to last updates in https://github.com/mlcommons/inference/issues/1782#issuecomment-2237093081we don't need path for downloading checkpoint model.

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ time axs byquery downloaded,pytorch_model,model_name=mixtral-8x7b
        "/usr/bin/rclone" copy mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1 "/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7
b-instruct-v0.1" -P
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^
Transferred:      173.984 GiB / 173.984 GiB, 100%, 23.538 MiB/s, ETA 0s
Transferred:           46 / 46, 100%
Elapsed time:      36m5.1s
INFO:root:Matched Rule #1/2 produced an entry, which matches the original query.

['^', 'byname', 'downloaded_mixtral-8x7b-instruct-v0.1']

real    36m5.382s
user    12m17.626s
sys     11m56.073s
/local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1

real    0m0.081s
user    0m0.064s
sys     0m0.017s
mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ ls -la /local/mnt/workspace/mmirkina/work_collection/downloaded_mixtral-8x7b-instruct-v0.1/mixtral-8x7b-instruct-v0.1/
total 182435452
drwxr-xr-x 2 mmirkina users       4096 Jul 22 05:05 .
drwxr-xr-x 3 mmirkina users       4096 Jul 22 04:29 ..
-rw-r--r-- 1 mmirkina users        803 Jun 24 17:04 config.json
-rw-r--r-- 1 mmirkina users        111 Jun 24 17:04 generation_config.json
-rw-r--r-- 1 mmirkina users 4920052720 Jun 24 17:04 model-00001-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00002-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00003-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00004-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:04 model-00005-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504264 Jun 24 17:05 model-00006-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559912 Jun 24 17:05 model-00007-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00008-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:05 model-00009-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00010-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559920 Jun 24 17:06 model-00011-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646240 Jun 24 17:06 model-00012-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:06 model-00013-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00014-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00015-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00016-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:07 model-00017-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00018-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:08 model-00019-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00020-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:08 model-00021-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00022-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00023-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:09 model-00024-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:09 model-00025-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00026-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00027-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00028-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:10 model-00029-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00030-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4932504280 Jun 24 17:11 model-00031-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00032-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:11 model-00033-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00034-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00035-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4865559944 Jun 24 17:12 model-00036-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4999646264 Jun 24 17:12 model-00037-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 4798417968 Jun 24 17:13 model-00038-of-00039.safetensors
-rw-r--r-- 1 mmirkina users 1463862216 Jun 24 17:13 model-00039-of-00039.safetensors
-rw-r--r-- 1 mmirkina users      92659 Jun 24 17:13 model.safetensors.index.json
-rw-r--r-- 1 mmirkina users         72 Jul 18 12:00 special_tokens_map.json
-rw-r--r-- 1 mmirkina users       1466 Jul 18 11:56 tokenizer_config.json
-rw-r--r-- 1 mmirkina users    1795303 Jul 18 11:56 tokenizer.json
-rw-r--r-- 1 mmirkina users     493443 Jul 18 11:57 tokenizer.model

So remove patch supporing:

mmirkina@aus655-apollo-0:/local/mnt/workspace/mmirkina/work_collection/axs2mlperf$ git diff
diff --git a/model_pytorch_mixtral_recipe/data_axs.json b/model_pytorch_mixtral_recipe/data_axs.json
index 7aa9cb0..26b25bf 100644
--- a/model_pytorch_mixtral_recipe/data_axs.json
+++ b/model_pytorch_mixtral_recipe/data_axs.json
@@ -3,10 +3,7 @@
         [ [ "downloaded", "pytorch_model", "model_name=mixtral-8x7b", "source?=via_rclone" ], [["get_kernel"],["byname","downloader"],["download"]], {
             "downloading_tool_query": "shell_tool,can_download_url_from_rclone",
             "file_name": [ "mixtral-8x7b-instruct-v0.1" ],
-            "url": "mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1",
-            "patch": "tokenizer.patch",
-            "abs_patch_path": [ "^^", "substitute", "#{this_entry_path}#/tokenizer.patch" ]
-        }, [ "this_entry_path" ] ]
-    ],
-    "this_entry_path": [ "^^", "get_path" ]
+            "url": "mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7b-instruct-v0.1"
+        }, [] ]
+  
maria-18-git commented 1 month ago

Last version of useful commands:

maria-18-git commented 1 month ago

All commits in mixtral-dev branch in axs2mlperf repository - https://github.com/krai/axs2mlperf/tree/mixtral-dev.