[WIP] Profiling Client - Githubissues

TackAdam commented 2 years ago

Overview

This will introduce the ability in Observability to visualize the trend of a user’s cluster’s CPU/memory usage in a line chart. The user will also be able to view performance metrics and search through them using keywords. Using parca-agent will enable continuous sampling using eBPF at a low overhead cost.

Problem statement

As companies continue to optimize for cloud-native applications, it has become increasingly important to understand performance metrics at the most granular level possible. Existing tools show performance issues exist such as latency and memory leaks; but continuously collecting profiles will allow us to drill down and see why a particular system is experience such problems.

The profiling client will address these needs by creating an easily digestible and searchable display. By always collecting data background process that are indirectly impacting a user’s request flow will not be overlooked. The user will be able to quickly diagnose problems using this tool

Requirements

Profiling agent that publishes CPU/memory performance data to OpenSearch
- https://github.com/parca-dev
- https://github.com/TackAdam/parca-agent/tree/opensearch
Utilize Dashboards visualizations to display the profiled metrics.

User Stories

As a user, I want to visit the Profiling Agent page to see the trend of my cluster’s CPU/memory usage in a line chart visualization
As a user, I want to be able to view and paginate through the performance metrics
As a user, I want to be able to search for keywords in my cluster’s performance metrics
As a user, I want to be able to sort metrics
As a user, I want to be able to customize the time range of my visualization
As a user, I want to be able to add different metrics to my profiling agent visualization

Functional requirements

The profiling should be continuous and generate metrics at a low overhead cost with the appropriate tags to allow sorting of the data. This data should be easily viewable to enhance the user’s experience and allow quick debugging and understanding of operations that are occurring.

Operational requirements

The following use cases are examples of what functionalities we want to enable with the introduction of profiling:

Continuous Profiling
- Sampling profiling has extremely low overhead
  - User’s can expect statistically significant reports from its constant monitoring
Searchable Metrics
- The profiled data will have pertinent tags
  - User’s can expect data that is easy to navigate and understand when troubleshooting
    What is out of scope
    
    ....

Architecture Diagram

Dataflow diagram

image(1)

Metadata from Parca-Agent

image (1) Definitions from https://github.com/google/pprof/tree/main/proto

sample: A profile sample, with the values measured and the associated call stack as a list of location ids. Samples with identical call stacks can be merged by adding their respective values, element by element.

location: A unique place in the program, commonly mapped to a single instruction address. It has a unique nonzero id, to be referenced from the samples. It contains source information in the form of lines, and a mapping id that points to a binary.

mapping: A binary that is part of the program during the profile collection. It has a unique nonzero id, referenced from the locations. It includes details on how the binary was mapped during program execution. By convention the main program binary is the first mapping, followed by any shared libraries.

function: A program function as defined in the program source. It has a unique nonzero id, referenced from the location lines. It contains a human-readable name for the function (eg a C++ demangled name), a system name (eg a C++ mangled name), the name of the corresponding source file, and other function attributes.

Sending Profiles with OpenSearch's GoClient

Creating the GoClient

https://opensearch.org/docs/latest/clients/go/ First you need to create a .env file to support secure username and password(cmd line now supported).

Next you create the GoClient and initialize it with the desired index mapping.

Call the function while passing in the store address provided on the cmd line.

Sending the pprof data

First make a struct to hold the information in a json format.

Then adjust the function to take in the pprof information needed for profiling.

Fill the struct with the information and then marshal it and it to the index using bytes.New Reader().

Then call the function where you want the data to be sent from the parca-agent to Opensearch.

OpenSearch Ingestion

On the cmd line specify the OpenSearch back-end address to store the information and the the service to profile. For my example I am profiling the OpenSearch client I have running on my EC2 instance by using "--systemd-units=opensearch.service". What is being profiled can be set with Kubernetes or systemd.

We can than verify the index was created with the correct mapping use the dev tool.

Using the dev tool we can also verify the profiling data was received.

Next Steps

Visualization

Step1 - Create a tab for profiling.

Step 2 - Display the profiled information in a line graph and flame graph.

Solution

Solution overview

To implement the desired continuous profiling the first step is to get data ingestion from Parca to OpenSearch. The next step is to format the ingested data to a usable format. The final step will be having the data be viewable through OpenSearch Dashboards.

Proposed solution

The plan is to fork parca-agent and create a version that sends the data from the parca-agent in a digestible form to OpenSearch’s back end. Once the data is received adjusting OpenSearch Dashboard to be able to display the profiled data in a user friendly interface.

Alternatives considered

https://pyroscope.io/

TackAdam commented 1 year ago

Added command line variable for using OpenSearch and entering environment variables for username and password.

This or a .env file can be created for authentication.

This tell the parca-agent to only send data using OpenSearch's goClient and prevents error from the regular batch client failing to send. The cmd line flag is added to main; and then stored in the package where it is used(agent).