parca-dev / parca

Continuous profiling for analysis of CPU and memory usage, down to the line number and throughout time. Saving infrastructure cost, improving performance, and increasing reliability.
https://parca.dev/
Apache License 2.0
4.05k stars 210 forks source link

NVIDIA GPU profiling support #2241

Open elgalu opened 1 year ago

elgalu commented 1 year ago

Feature Request

Support for GPU profiling similar to what's currently being offered for CPU.

Alternative solutions

  1. Sentry's (OSS) OpenTelemetry Collector collects GPU metrics
  2. Framework-specific like PyTorch Profiler or TensorFlow Profiler that could be incorporated into this product to provide it as a service out of the box.

Is there a use case or business reason for this request?

The CPU market is growing at a compound annual growth rate (CAGR) of 4.36% while the GPU market grows at a CAGR of 33.4%. Also NVIDIA has the biggest market share at 80%.

brancz commented 1 year ago

I love this idea! No clue if we can use eBPF for this, but pprof is a generic format, so as long as we can get it into that format we can make it work!