webb-tools / gadget

A framework for building modular AVS and Tangle Blueprints: https://docs.tangle.tools/developers/blueprints.
https://tangle.tools
Apache License 2.0
11 stars 1 forks source link

[TASK] Add benchmarking #213

Closed tbraun96 closed 6 days ago

tbraun96 commented 1 month ago

Overview

As a blueprint developer, I want:

Writing a blueprint benchmark should feel much like writing unit tests. It needs to be carefully crafted to execute a specific logical path in the blueprint code. While tests check for various success and failure conditions, benchmarks focus on the most computationally intensive paths, also known as the "worst-case scenario."

#[benchmark] Macro

This macro will be added to a function that runs during the blueprint's benchmark mode. It will generate an output consumed by the manager in a specific format. Here's an example:

#[benchmark(threads = 1)]
pub async fn keygen() {
  // Perform a full key generation with the worst case, for example, N = 32, t = 18.
}

This will generate a bench_keygen(&Bencher) function, which the manager will call during benchmark mode (details on this later):

pub fn bench_keygen(b: &Bencher) -> Result<(), BenchmarkError> {
  let runtime = b.create_tokio_runtime().with_threads(1);
  let out = runtime.block_on(move || {
     // Setup timers and watchers for RAM, Disk, and Network.
     keygen();
     // Stop timers and watchers, then collect data.
     benchmark_data
  });
  b.record(0, "keygen", out);
}

Calling this function multiple times returns the following output to the manager:

[
    {
        "job_id": 0,
        "job_name": "keygen",
        "duration": 10000, // in ms
        "vcpu": 1, // Same as threads.
        "mem": 5.00, // in MB
        "disk": 1, // in MB
        "net_in": 512, // in KB
        "net_out": 10, // in KB
    },
    {
        // ...
    }
]

Benchmark Mode

Similar to webb-tools/gadget#216 and the introduction of the Registration mode, I propose adding a benchmark mode. This mode will execute the gadget to be benchmarked on the operator machine, outputting the benchmarks for the manager to update its on-chain pricing.

On-Chain Pricing Model

Each operator sets basic pricing for their compute units:

  1. vCPU: in terms of TNT/Block (translated to $/hr by the UI).
  2. RAM (Per MB): in terms of TNT/Block (translated to $/hr by the UI).
  3. Disk (Per GB): in terms of TNT/Block (translated to $/hr by the UI).
  4. Bandwidth (Per Gb): in terms of TNT/Block (translated to $/hr by the UI).

With these inputs on-chain as a global configuration for each operator profile, operators can have an automated way to set pricing for each blueprint based on these benchmarks. By pulling the global configuration for an operator, we can compute the cost to execute one job (e.g., one keygen invocation), making it visible to the user deploying this service on that specific operator.

Checklist


drewstone commented 1 month ago

Deriving Base Price from Service Tests

To derive a base price function from service tests, we should follow these steps:

  1. Resource Profiling: Run comprehensive tests on the service across various configurations and workloads. Measure:

    • CPU usage (in core-seconds)
    • Memory usage (in GB-seconds)
    • Storage I/O (in GB read/written)
    • Network I/O (in GB transferred)
    • Duration of the test
  2. Cost Analysis:

    • Determine the cost of resources in a traditional cloud environment (e.g., AWS, GCP, Azure)
    • Calculate the cost of running the service in these environments based on the resource usage from step 1
  3. Performance Metrics:

    • Measure key performance indicators (KPIs) such as throughput, latency, and concurrency
  4. Opportunity Cost:

    • Estimate the opportunity cost for operators (i.e., what they could earn by using their resources for other purposes)
  5. Market Analysis:

    • Research prices of similar services in the market

Choices for the Base Price Function f

Based on these inputs, we can consider several forms for the base price function f. Here are some options:

  1. Linear Combination:

    f(CPU, Memory, Storage, Bandwidth) = a*CPU + b*Memory + c*Storage + d*Bandwidth + e

    Where a, b, c, and d are coefficients derived from cost analysis, and e is a constant representing fixed costs.

    Pros: Simple, easy to understand and implement Cons: May not capture complex interactions between resources

  2. Weighted Geometric Mean:

    f(CPU, Memory, Storage, Bandwidth) = (CPU^a * Memory^b * Storage^c * Bandwidth^d)^(1/(a+b+c+d)) * k

    Where a, b, c, and d are weights, and k is a scaling factor.

    Pros: Captures interdependencies between resources Cons: More complex to tune and explain

  3. Piecewise Function:

    f(CPU, Memory, Storage, Bandwidth) = 
     case1: f1(CPU, Memory, Storage, Bandwidth) if CPU < threshold1 and Memory < threshold2 ...
     case2: f2(CPU, Memory, Storage, Bandwidth) if CPU >= threshold1 and Memory < threshold2 ...
     ...

    Where each case represents a different pricing tier based on resource usage.

    Pros: Can accurately represent complex pricing structures Cons: May be less predictable for users

  4. Machine Learning Model: Train a model (e.g., random forest, neural network) on the data collected from tests and market analysis.

    Pros: Can capture complex, non-linear relationships Cons: Black-box nature may reduce transparency

  5. Cobb-Douglas Production Function:

    f(CPU, Memory, Storage, Bandwidth) = A * CPU^α * Memory^β * Storage^γ * Bandwidth^δ

    Where A is total factor productivity, and α, β, γ, δ are output elasticities.

    Pros: Well-established in economics, captures diminishing returns Cons: Assumes perfect competition and constant returns to scale

Recommended Approach

Given the nature of cloud services and the need for transparency in a decentralized marketplace, I would recommend starting with a hybrid approach:

  1. Use a Linear Combination as the base model:

    f(CPU, Memory, Storage, Bandwidth) = a*CPU + b*Memory + c*Storage + d*Bandwidth + e
  2. Add a Dynamic Adjustment Factor based on market conditions:

    Final Price = f(CPU, Memory, Storage, Bandwidth) * (1 + market_adjustment)

    Where market_adjustment is derived from current supply/demand in the Tangle marketplace.

  3. Implement Tiered Pricing for high-usage scenarios:

    • Apply multipliers to the base price when usage exceeds certain thresholds.

This approach offers several advantages:

To determine the coefficients (a, b, c, d, e):

  1. Use regression analysis on the data from service tests and cost analysis.
  2. Adjust based on market analysis and opportunity cost calculations.
  3. Regularly update these coefficients based on ongoing market data and operator feedback.

Finally, it's crucial to:

shekohex commented 3 weeks ago

Why This Model Won’t Work

Firstly, the benchmark will be conducted on a machine that is likely to be very different from those that will actually run the service. Developers might benchmark on their MacBook or their work machine, which are not representative of the machines most operators will use. Even if a server is rented specifically for benchmarking, it would still differ in terms of CPU and RAM usage. As a result, these benchmarks will be largely useless, or at best, they will provide only an indication, not an accurate pricing model for everyone.

My Proposed Approach

The developer (or blueprint developer) should include metadata about the deployment, specifying the amount of RAM and the number of cores required for the blueprint to function correctly. This specification should be divided into two main categories:

  1. Recommended Requirements: This should be the default, including vCPU, RAM, Disk, etc.
  2. Minimum Requirements: Users on a budget can choose this option.

Operators should define their pricing based on two factors:

  1. The computational power required to run the service instance, primarily bidding on CPU, RAM, Disk, GPU, etc.
  2. The nature of the service, particularly how much maintenance it requires, which can vary between operators.

This would create a market where operators compete to offer the best price based on these two factors.

Comparison with Other Models

It’s important to discuss this in light of what I’ve observed with the Akash network and similar DePIN projects, which offer compute as a service. They’ve created a market where operators compete to provide the best price for CPU, RAM, Disk, and GPU. Operators simply need to set their price per thread, per GB of RAM, and per GB of Disk, and then run the manager (essentially a K8s server that manages services) with these parameters. Users then select the best operators for their needs when deploying services, effectively bidding to get their services running on the chosen hardware (Akash Calculator).

Our design shares some similarities with these models but also has differences. For example, while we both offer Compute as a Service, we primarily offer Function as a Service (FaaS), where billing is based on invocations. We also offer continuous jobs, running services 24/7, similar to Akash’s main model. However, the pricing models for these two approaches are different, and we can’t make one pricing model work for both.

In Web2 (e.g., GCP, AWS, Azure), "serverless" infrastructure means that your service isn’t running 24/7. Instead, when an invocation occurs (i.e., creating a job call), a server is spawned (or an existing one is allocated) to run the service. After the invocation, the server is shut down, though it might remain available briefly in case of subsequent invocations. In this model, billing is based on CPU and RAM usage during the invocation (you can see their pricing here: AWS Lambda Pricing).

Choosing the Right Model

Before we consider benchmarking, we need to think carefully about which model best suits our use case. I believe the best approach would be for each operator to offer a base compute price, which can be adjusted for individual blueprints. This could result in the best pricing for your service. For FaaS, the AWS model wouldn’t work—a fixed price per millisecond of CPU across different CPUs wouldn’t be fair. Instead, pricing should be defined by the protocol, using a tool that gathers diagnostic data about the operator’s hardware to determine the fairest price based on the current operator market. This information would be configured in the manager and recorded on the operator’s on-chain profile. When work is performed, operators would be paid per millisecond or per block, etc. For instance, if a job takes 200ms to complete but is submitted in the next block, the user would only pay for the 200ms, not the full block time of 6 seconds.

Verification of Job Execution Time

This raises the question of how to verify job execution time. Could operators slow down execution to increase the time and thus earn more? While their price would be lower in this case, it’s still a concern.

Another model could involve paying per job execution, regardless of how long it takes on different hardware. However, this wouldn’t be fair since older, cheaper hardware might take longer to complete the job than modern equipment, yet both would receive the same payment. A potential solution could involve the following:

  1. When the blueprint is downloaded to the operator's machine, the manager runs the blueprint in benchmark mode, conducting basic tests added by the developer.
  2. Based on the operator’s set prices for their hardware ($/hr, TNT/hr), we could calculate a cost per execution ($/execution, TNT/execution).
  3. The result would be updated on the operator's on-chain profile for this specific blueprint.

When a user requests a service from multiple operators and executes Job X on this service, the final TNT payment to the operators would be the sum of their TNT costs for executing Job X.

These proposals are based on research I’ve done on Akash, GCP, and AWS.

shekohex commented 6 days ago

I'm closing this task for now, we may need to revisit this again when we have full blueprints.