Add profile subcommand to run perf analyzer

Example output:
$ triton model profile -m llama
pull engine()
run_server()
profile()
Warming up...
Warmed up, profiling now...
[ BENCHMARK SUMMARY ]
Prompt size: --
  * Max first token latency: -- ms
  * Min first token latency: -- ms
  * Avg first token latency: -- ms
  * p50 first token latency: -- ms
  * p90 first token latency: -- ms
  * p95 first token latency: -- ms
  * p99 first token latency: -- ms
  * Max generation latency: -- ms
  * Min generation latency: -- ms
  * Avg generation latency: -- ms
  * p50 generation latency: -- ms
  * p90 generation latency: -- ms
  * p95 generation latency: -- ms
  * p99 generation latency: -- ms
  * Avg output token latency: -- ms/output token
  * Avg total token-to-token latency: -- ms
  * Max end-to-end latency: -- ms
  * Min end-to-end latency: -- ms
  * Avg end-to-end latency: -- ms
  * p50 end-to-end latency: -- ms
  * p90 end-to-end latency: -- ms
  * p95 end-to-end latency: -- ms
  * p99 end-to-end latency: -- ms
  * Max end-to-end throughput: -- tokens/s
  * Min end-to-end throughput: -- tokens/s
  * Avg end-to-end throughput: -- tokens/s
  * p50 end-to-end throughput: -- tokens/s
  * p90 end-to-end throughput: -- tokens/s
  * p95 end-to-end throughput: -- tokens/s
  * p99 end-to-end throughput: -- tokens/s
  * Max generation throughput: -- output tokens/s
  * Min generation throughput: -- output tokens/s
  * Avg generation throughput: -- output tokens/s
  * p50 generation throughput: -- output tokens/s
  * p90 generation throughput: -- output tokens/s
  * p95 generation throughput: -- output tokens/s
  * p99 generation throughput: -- output tokens/s
cc @nv-hwoo
triton-inference-server / triton_cli

Add profile subcommand to run perf analyzer #13