krai / axs2mlperf

Automated KRAI X workflows for reproducing MLPerf Inference submissions
https://krai.ai
MIT License
1 stars 1 forks source link

Remove links in power experiments #27

Closed maria-18-git closed 1 year ago

maria-18-git commented 1 year ago

Now we have links for other experiments in power experiment. In this case it is difficult to move power experiments to other machines where links will be wrong. Need to switch to JSON with names of experiments. Example:

maria@eb6 ~/work_collection/axs2mlperf/base_loadgen_program (master *=)$ time axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.68,sut_name=eb6-kilt-qaic
...
['^', 'byname', 'generated_by_power_measurement_on_run_fb6c03e09eb54951a3c79345c1398afb']
maria@eb6 ~/work_collection/axs2mlperf/base_loadgen_program (master *=)$ ll ~/work_collection/generated_by_power_measurement_on_run_fb6c03e09eb54951a3c79345c1398afb
total 36
drwxr-xr-x  3 maria krai 4096 Oct 18 16:02 ./                                                                                                                                               drwxr-xr-x 99 maria krai 8192 Oct 18 15:52 ../
-rw-r--r--  1 maria krai 2698 Oct 18 16:02 data_axs.json
lrwxrwxrwx  1 maria krai  122 Oct 18 16:02 last_mlperf_logs -> /data/maria/work_collection/generated_by_image_classification_using_onnxrt_loadgen_on_get_883c03616cf64d7b96e67306a1c37338/  drwxr-xr-x  5 maria krai 4096 Oct 18 16:02 power_logs/
-rw-r--r--  1 maria krai  251 Oct 18 16:02 program_output.json
lrwxrwxrwx  1 maria krai  122 Oct 18 15:52 ranging_logs -> /data/maria/work_collection/generated_by_image_classification_using_onnxrt_loadgen_on_get_f3da5fb349de43179d7b637b3b09f46d/
lrwxrwxrwx  1 maria krai  122 Oct 18 16:02 testing_logs -> /data/maria/work_collection/generated_by_image_classification_using_onnxrt_loadgen_on_get_883c03616cf64d7b96e67306a1c37338/
maria@eb6 ~/work_collection/axs2mlperf/base_loadgen_program (master *=)$ cat ~/work_collection/generated_by_power_measurement_on_run_fb6c03e09eb54951a3c79345c1398afb/program_output.json
{
    "ranging_entry_name": "generated_by_image_classification_using_onnxrt_loadgen_on_get_f3da5fb349de43179d7b637b3b09f46d",
    "testing_entry_name": "generated_by_image_classification_using_onnxrt_loadgen_on_get_883c03616cf64d7b96e67306a1c37338"
}
maria-18-git commented 1 year ago

As an example of potential solution: we need to update if in if os.path.exists( symlink_to ): https://github.com/krai/axs2mlperf/blob/master/base_loadgen_program/code_axs.py#L55

should change symlink_to to power_client_entrydic_path

maria-18-git commented 1 year ago

We need to use last_mlperf_logs because when we run power command:

time axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.66,sut_name=eb6-kilt-qaic

We have created command:

        /usr/bin/python3 /data/maria/work_collection/mlperf_power_git_master/ptd_client_server/client.py --run-workload "axs byquery loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=AccuracyOnly,model_name=resnet50,loadgen_dataset_size=150,loadgen_buffer_size=8,sut_name=eb6-kilt-qaic,_with_power+,symlink_to=/data/maria/work_collection/generated_by_power_measurement_on_run_a98af0817b7d488185fc48d70389989e/last_mlperf_logs,power_client_entrydic_path=/data/maria/work_collection/generated_by_power_measurement_on_run_a98af0817b7d488185fc48d70389989e/program_output.json,effective_no_ranging-" --loadgen-logs "/data/maria/work_collection/generated_by_power_measurement_on_run_a98af0817b7d488185fc48d70389989e/last_mlperf_logs" --output "/data/maria/work_collection/generated_by_power_measurement_on_run_a98af0817b7d488185fc48d70389989e/power_logs" --addr 192.168.4.3 --port 4949 --ntp time.google.com --no-timestamp-path

So we need to set input directory with loadgen loags

--loadgen-logs "/data/maria/work_collection/generated_by_power_measurement_on_run_a98af0817b7d488185fc48d70389989e/last_mlperf_logs"

At this moment we can't use experiment name from program_output.json(we don't have this file at this time). So we have two possible solutions:

Solution 1: create tmp directory and copy input files:

maria@eb6 ~/work_collection/axs2mlperf/power_measurement (master *=)$ time axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.67,sut_name=eb6-kilt-qaic,no_ranging+ 
['^', 'byname', 'generated_by_power_measurement_on_run_aece90df425c42649b176a442e5295e3']
                                                                                                                                                                                                 real    10m44.616s
user    79m20.386s
sys     0m3.840s
maria@eb6 ~/work_collection/axs2mlperf/power_measurement (master *=)$ ll ~/work_collection/generated_by_power_measurement_on_run_aece90df425c42649b176a442e5295e3
total 28
drwxr-xr-x   4 maria krai 4096 Oct 23 21:48 ./
drwxr-xr-x 110 maria krai 8192 Oct 23 21:38 ../
-rw-r--r--   1 maria krai 2760 Oct 23 21:48 data_axs.json
drwxr-xr-x   4 maria krai 4096 Oct 23 21:48 power_logs/
-rw-r--r--   1 maria krai  127 Oct 23 21:48 program_output.json
drwxr-xr-x   2 maria krai 4096 Oct 23 21:38 tmp/
maria@eb6 ~/work_collection/axs2mlperf/power_measurement (master *=)$ cat ~/work_collection/generated_by_power_measurement_on_run_aece90df425c42649b176a442e5295e3/program_output.json
{
    "testing_entry_name": "generated_by_image_classification_using_onnxrt_loadgen_on_get_1457c2f3f0ae4613ac67609b2f72d6f9"
}

Performance:

maria@eb6 ~/work_collection/axs2mlperf/power_measurement (master *=)$ axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.67,sut_name=eb6-kilt-qaic,no_ranging+ , get performance
VALID : _Early_stopping_90th_percentile_estimate=136.828 (milliseconds)

power:

maria@eb6 ~/work_collection/axs2mlperf/power_measurement (master *=)$ axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.67,sut_name=eb6-kilt-qaic,no_ranging+ , avg_power
16.18421666666664
maria-18-git commented 1 year ago
maria@eb6 ~/work_collection/axs2mlperf/power_measurement (master *=)$ time axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.66,sut_name=eb6-kilt-qaic 
...
['^', 'byname', 'generated_by_power_measurement_on_run_09cc474c97ca435cb3fb469d4feb9dc2']

real    21m8.759s
user    158m44.115s
sys     0m6.650s
maria@eb6 ~/work_collection/axs2mlperf/power_measurement (master *=)$ ll ~/work_collection/generated_by_power_measurement_on_run_09cc474c97ca435cb3fb469d4feb9dc2
total 28
drwxr-xr-x   4 maria krai 4096 Oct 23 22:15 ./
drwxr-xr-x 110 maria krai 8192 Oct 23 22:05 ../
-rw-r--r--   1 maria krai 2683 Oct 23 22:16 data_axs.json
drwxr-xr-x   5 maria krai 4096 Oct 23 22:16 power_logs/
-rw-r--r--   1 maria krai  251 Oct 23 22:15 program_output.json
drwxr-xr-x   2 maria krai 4096 Oct 23 22:05 tmp/

Performance:

maria@eb6 ~/work_collection/axs2mlperf/power_measurement (master *=)$ time axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.66,sut_name=eb6-kilt-qaic , get performance
VALID : _Early_stopping_90th_percentile_estimate=127.267 (milliseconds)

power

maria@eb6 ~/work_collection/axs2mlperf/power_measurement (master *=)$ axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.66,sut_name=eb6-kilt-qaic , avg_power
16.595569999999995
maria-18-git commented 1 year ago

Solution 2: use 1 symlink for last_mlperf_logs:

Accuracy:

maria@eb6 ~/work_collection/axs2mlperf (master *=)$ time axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=AccuracyOnly,model_name=resnet50,loadgen_dataset_size=250,loadgen_buffer_size=8,sut_name=eb6-kilt-qaic
...
['^', 'byname', 'generated_by_power_measurement_on_run_8a772e749efe45e495be0d1c70cfb75b']

real    2m7.708s
user    7m58.012s
sys     0m1.642s
maria@eb6 ~/work_collection/axs2mlperf (master *=)$ ll ~/work_collection/generated_by_power_measurement_on_run_8a772e749efe45e495be0d1c70cfb75b
total 32
drwxr-xr-x   3 maria krai  4096 Oct 24 10:09 ./
drwxr-xr-x 116 maria krai 12288 Oct 24 10:09 ../
-rw-r--r--   1 maria krai  2594 Oct 24 10:09 data_axs.json
lrwxrwxrwx   1 maria krai   122 Oct 24 10:09 last_mlperf_logs -> /data/maria/work_collection/generated_by_image_classification_using_onnxrt_loadgen_on_get_e231f38ac80643cc8bfb7b875f84b05f/
drwxr-xr-x   5 maria krai  4096 Oct 24 10:09 power_logs/
-rw-r--r--   1 maria krai   251 Oct 24 10:09 program_output.json
maria-18-git commented 1 year ago
maria@eb6 ~/work_collection/axs2mlperf (master *=)$ cat ~/work_collection/generated_by_power_measurement_on_run_8a772e749efe45e495be0d1c70cfb75b/program_output.json
{
    "ranging_entry_name": "generated_by_image_classification_using_onnxrt_loadgen_on_get_726e50f4f4a3448caafac37cb3a0c786",
    "testing_entry_name": "generated_by_image_classification_using_onnxrt_loadgen_on_get_e231f38ac80643cc8bfb7b875f84b05f"
}
maria-18-git commented 1 year ago

Performance

no_ranging+

maria@eb6 ~/work_collection/axs2mlperf (master *=)$ time axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.64,sut_name=eb6-kilt-qaic,no_ranging+ 
...
        /usr/bin/python3 /data/maria/work_collection/mlperf_power_git_master/ptd_client_server/client.py --run-workload "axs byquery loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.64,sut_name=eb6-kilt-qaic,no_ranging+,_with_power+,symlink_to=/data/maria/work_collection/generated_by_power_measurement_on_run_81c441472ec1495d9010f2b4c6450b5f/last_mlperf_logs,power_client_entrydic_path=/data/maria/work_collection/generated_by_power_measurement_on_run_81c441472ec1495d9010f2b4c6450b5f/program_output.json,effective_no_ranging+" --loadgen-logs "/data/maria/work_collection/generated_by_power_measurement_on_run_81c441472ec1495d9010f2b4c6450b5f/last_mlperf_logs" --output "/data/maria/work_collection/generated_by_power_measurement_on_run_81c441472ec1495d9010f2b4c6450b5f/power_logs" --addr 192.168.4.3 --port 4949 --ntp time.google.com --no-timestamp-path --max-amps 0.5 --max-volts 300
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
client 2023-10-24 10:15:28,767 [INFO] Creating output directory '/data/maria/work_collection/generated_by_power_measurement_on_run_81c441472ec1495d9010f2b4c6450b5f/power_logs'
client 2023-10-24 10:15:28,768 [INFO] Sending command to the server: 'mlcommons/power client v3'
client 2023-10-24 10:15:28,769 [INFO] Got response: 'mlcommons/power server v3'
client 2023-10-24 10:15:28,769 [INFO] Synchronizing with the server and with time.google.com...
client 2023-10-24 10:15:28,791 [INFO] NTP:offset = 0.000 s, delay = 0.011 s
client 2023-10-24 10:15:28,791 [INFO] Sending command to the server: 'time'
client 2023-10-24 10:15:28,792 [INFO] Got response: '1698138928.7924335'
client 2023-10-24 10:15:28,792 [INFO] The time difference between the client and the server is within range -0.782 ms..0.115 ms
client 2023-10-24 10:15:28,792 [INFO] Sending command to the server: 'new,,94ba4393-a465-4c4e-9e90-e929dc8c1efd'
client 2023-10-24 10:15:28,793 [INFO] Got response: 'OK 2023-10-24_10-15-28,9cece5a5-2e85-44b2-b964-c9170b9a1151'
client 2023-10-24 10:15:28,793 [INFO] Session id is '2023-10-24_10-15-28'
client 2023-10-24 10:15:28,793 [INFO] Sources: {"sources": {"__init__.py": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "client.py": "33ca4f26368777ac06e01f9567b714a4b8063886", "lib/__init__.py": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "lib/client.py": "ac2aa093c8e8bbc9569b9e2a3471bc64e58a2258", "lib/common.py": "611d8b29633d331eb19c9455ea3b5fa3284ed6df", "lib/external/__init__.py": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "lib/external/ntplib.py": "4da8f970656505a40483206ef2b5d3dd5e81711d", "lib/server.py": "c7af63c31bb2fbedea4345f571f6e3507d268ada", "lib/source_hashes.py": "60a2e02193209e8d392803326208d5466342da18", "lib/summary.py": "aa92f0a3f975eecd44d3c0cd0236342ccc9f941d", "lib/time_sync.py": "80894ef2389e540781ff78de94db16aa4203a14e", "server.py": "c3f90f2f7eeb4db30727556d0c815ebc89b3d28b", "tests/unit/__init__.py": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "tests/unit/test_server.py": "948c1995d4008bc2aa6c4046a34ffa3858d6d671", "tests/unit/test_source_hashes.py": "00468a2907583c593e6574a1f6b404e4651c221a"}, "modules": {"ptd_client_server.lib.client": "lib/client.py", "ptd_client_server.lib.common": "lib/common.py", "ptd_client_server.lib.external.ntplib": "lib/external/ntplib.py", "ptd_client_server.lib.source_hashes": "lib/source_hashes.py", "ptd_client_server.lib.summary": "lib/summary.py", "ptd_client_server.lib.time_sync": "lib/time_sync.py"}}
client 2023-10-24 10:15:28,794 [WARNING] Providing manual ranges are only for experimental purpose and the produced results won't be valid for submission
client 2023-10-24 10:15:28,794 [INFO] Running workload in testing mode
client 2023-10-24 10:15:28,794 [INFO] Synchronizing with the server and with time.google.com...
client 2023-10-24 10:15:28,805 [INFO] NTP:offset = -0.000 s, delay = 0.011 s
client 2023-10-24 10:15:28,805 [INFO] Sending command to the server: 'time'
client 2023-10-24 10:15:28,806 [INFO] Got response: '1698138928.8062258'
client 2023-10-24 10:15:28,806 [INFO] The time difference between the client and the server is within range -0.773 ms..0.094 ms
client 2023-10-24 10:15:28,806 [INFO] Sending command to the server: 'session,2023-10-24_10-15-28,start,testing,300.0,0.5'
...
['^', 'byname', 'generated_by_power_measurement_on_run_81c441472ec1495d9010f2b4c6450b5f']
                                                                                                                                                                       real    10m46.590s
user    79m20.510s
sys     0m3.600s
maria@eb6 ~/work_collection/axs2mlperf (master *=)$ ll ~/work_collection/generated_by_power_measurement_on_run_81c441472ec1495d9010f2b4c6450b5f
total 32
drwxr-xr-x   3 maria krai  4096 Oct 24 10:26 ./
drwxr-xr-x 118 maria krai 12288 Oct 24 10:15 ../
-rw-r--r--   1 maria krai  2775 Oct 24 10:26 data_axs.json
lrwxrwxrwx   1 maria krai   122 Oct 24 10:26 last_mlperf_logs -> /data/maria/work_collection/generated_by_image_classification_using_onnxrt_loadgen_on_get_a9334b8209234006a39c6ac1e4e725e2/
drwxr-xr-x   4 maria krai  4096 Oct 24 10:26 power_logs/
-rw-r--r--   1 maria krai   127 Oct 24 10:26 program_output.json
maria-18-git commented 1 year ago
maria-18-git commented 1 year ago

no_ranging- (by default)

maria@eb6 ~/work_collection/axs2mlperf/power_measurement (master *=)$ time axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.64,sut_name=eb6-kilt-qaic 
...
['^', 'byname', 'generated_by_power_measurement_on_run_95ceb14b94e7437f9fa7b1d341a97ca2']
                                                                                                                                                                       real    21m12.831s
user    158m36.430s
sys     0m6.439s
maria@eb6 ~/work_collection/axs2mlperf/power_measurement (master *=)$ ll ~/work_collection/generated_by_power_measurement_on_run_95ceb14b94e7437f9fa7b1d341a97ca2
total 32
drwxr-xr-x   3 maria krai  4096 Oct 24 11:53 ./
drwxr-xr-x 117 maria krai 12288 Oct 24 11:43 ../
-rw-r--r--   1 maria krai  2698 Oct 24 11:53 data_axs.json                                                                                                             lrwxrwxrwx   1 maria krai   122 Oct 24 11:53 last_mlperf_logs -> /data/maria/work_collection/generated_by_image_classification_using_onnxrt_loadgen_on_get_cfcaad2eb0be4b1386cf8509a38e1cbe/
drwxr-xr-x   5 maria krai  4096 Oct 24 11:53 power_logs/
-rw-r--r--   1 maria krai   251 Oct 24 11:53 program_output.json
maria-18-git commented 1 year ago

Summary: Solution 2(use 1 symlink for last_mlperf_logs) selected.

maria-18-git commented 1 year ago

Commit: Removed links in power experiments

maria-18-git commented 1 year ago

Status: Done.