error in request_metrics dictionary implementation

Team,

While running the "token_benchmark_ray.py" on a model in google cloud vertex AI I noticed that the division on line 111 is failing because the division is like

request_metrics[common_metrics.INTER_TOKEN_LAT] /= num_output_tokens or, request_metrics[common_metrics.INTER_TOKEN_LAT] = request_metrics[common_metrics.INTER_TOKEN_LAT]/ num_output_tokens or, request_metrics[common_metrics.INTER_TOKEN_LAT] = []/1

where, common_metrics.INTER_TOKEN_LAT = inter_token_latency_s As we can see from the example response below inter_token_latency_s = [] num_output_tokens = 1

For Example:

Below is a sample of "request_metrics" obtained during the api call

{'error_code': 200, 'error_msg': "'dict' object has no attribute 'split'", 'inter_token_latency_s': [], 'ttft_s': 0, 'end_to_end_latency_s': 1.4629004680000435, 'request_output_throughput_token_per_s': 0, 'number_total_tokens': 538, 'number_output_tokens': 0, 'number_input_tokens': 538}

File: token_benchmark_ray.py

Thanks

ray-project / llmperf

error in request_metrics dictionary implementation #49