Closed msaroufim closed 2 years ago
For example if you just run
(ray) ubuntu@ip-172-31-63-237:~/serve/ts/metrics$ python3 metric_collector.py --gpu 0
You get the following logs, the last field here is unix time which a dashboard provider should be able to easily handle
CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-172-31-63-237,1648517292
DiskAvailable.Gigabytes:110.83368682861328|#Level:Host|#hostname:ip-172-31-63-237,1648517292
DiskUsage.Gigabytes:82.9699821472168|#Level:Host|#hostname:ip-172-31-63-237,1648517292
DiskUtilization.Percent:42.8|#Level:Host|#hostname:ip-172-31-63-237,1648517292
MemoryAvailable.Megabytes:38658.7265625|#Level:Host|#hostname:ip-172-31-63-237,1648517292
MemoryUsed.Megabytes:1951.671875|#Level:Host|#hostname:ip-172-31-63-237,1648517292
MemoryUtilization.Percent:6.0|#Level:Host|#hostname:ip-172-31-63-237,1648517292
The metric collector is an independently interesting project outside of just torchserve which anyone can use to get system metrics for a pytorch inference
We can consider packaging it up indepdently for people to run it as a utility profiler https://github.com/pytorch/serve/blob/master/ts/metrics/metric_collector.py
Perhaps we can roll this into our existing #1457 efforts and combine it with ideas from https://github.com/pytorch/benchmark including