While running the "token_benchmark_ray.py" on a model in google cloud vertex AI I noticed that the division on line 111 is failing because the division is like
where,
common_metrics.INTER_TOKEN_LAT = inter_token_latency_s
As we can see from the example response below inter_token_latency_s = []
num_output_tokens = 1
For Example:
Below is a sample of "request_metrics" obtained during the api call
Team,
While running the "token_benchmark_ray.py" on a model in google cloud vertex AI I noticed that the division on line 111 is failing because the division is like
request_metrics[common_metrics.INTER_TOKEN_LAT] /= num_output_tokens or, request_metrics[common_metrics.INTER_TOKEN_LAT] = request_metrics[common_metrics.INTER_TOKEN_LAT]/ num_output_tokens or, request_metrics[common_metrics.INTER_TOKEN_LAT] = []/1
where, common_metrics.INTER_TOKEN_LAT = inter_token_latency_s As we can see from the example response below inter_token_latency_s = [] num_output_tokens = 1
For Example:
Below is a sample of "request_metrics" obtained during the api call
{'error_code': 200, 'error_msg': "'dict' object has no attribute 'split'", 'inter_token_latency_s': [], 'ttft_s': 0, 'end_to_end_latency_s': 1.4629004680000435, 'request_output_throughput_token_per_s': 0, 'number_total_tokens': 538, 'number_output_tokens': 0, 'number_input_tokens': 538}
File: token_benchmark_ray.py
Thanks