Closed tbraun96 closed 6 days ago
To derive a base price function from service tests, we should follow these steps:
Resource Profiling: Run comprehensive tests on the service across various configurations and workloads. Measure:
Cost Analysis:
Performance Metrics:
Opportunity Cost:
Market Analysis:
f
Based on these inputs, we can consider several forms for the base price function f
. Here are some options:
Linear Combination:
f(CPU, Memory, Storage, Bandwidth) = a*CPU + b*Memory + c*Storage + d*Bandwidth + e
Where a, b, c, and d are coefficients derived from cost analysis, and e is a constant representing fixed costs.
Pros: Simple, easy to understand and implement Cons: May not capture complex interactions between resources
Weighted Geometric Mean:
f(CPU, Memory, Storage, Bandwidth) = (CPU^a * Memory^b * Storage^c * Bandwidth^d)^(1/(a+b+c+d)) * k
Where a, b, c, and d are weights, and k is a scaling factor.
Pros: Captures interdependencies between resources Cons: More complex to tune and explain
Piecewise Function:
f(CPU, Memory, Storage, Bandwidth) =
case1: f1(CPU, Memory, Storage, Bandwidth) if CPU < threshold1 and Memory < threshold2 ...
case2: f2(CPU, Memory, Storage, Bandwidth) if CPU >= threshold1 and Memory < threshold2 ...
...
Where each case represents a different pricing tier based on resource usage.
Pros: Can accurately represent complex pricing structures Cons: May be less predictable for users
Machine Learning Model: Train a model (e.g., random forest, neural network) on the data collected from tests and market analysis.
Pros: Can capture complex, non-linear relationships Cons: Black-box nature may reduce transparency
Cobb-Douglas Production Function:
f(CPU, Memory, Storage, Bandwidth) = A * CPU^α * Memory^β * Storage^γ * Bandwidth^δ
Where A is total factor productivity, and α, β, γ, δ are output elasticities.
Pros: Well-established in economics, captures diminishing returns Cons: Assumes perfect competition and constant returns to scale
Given the nature of cloud services and the need for transparency in a decentralized marketplace, I would recommend starting with a hybrid approach:
Use a Linear Combination as the base model:
f(CPU, Memory, Storage, Bandwidth) = a*CPU + b*Memory + c*Storage + d*Bandwidth + e
Add a Dynamic Adjustment Factor based on market conditions:
Final Price = f(CPU, Memory, Storage, Bandwidth) * (1 + market_adjustment)
Where market_adjustment
is derived from current supply/demand in the Tangle marketplace.
Implement Tiered Pricing for high-usage scenarios:
This approach offers several advantages:
To determine the coefficients (a, b, c, d, e):
Finally, it's crucial to:
Why This Model Won’t Work
Firstly, the benchmark will be conducted on a machine that is likely to be very different from those that will actually run the service. Developers might benchmark on their MacBook or their work machine, which are not representative of the machines most operators will use. Even if a server is rented specifically for benchmarking, it would still differ in terms of CPU and RAM usage. As a result, these benchmarks will be largely useless, or at best, they will provide only an indication, not an accurate pricing model for everyone.
My Proposed Approach
The developer (or blueprint developer) should include metadata about the deployment, specifying the amount of RAM and the number of cores required for the blueprint to function correctly. This specification should be divided into two main categories:
Operators should define their pricing based on two factors:
This would create a market where operators compete to offer the best price based on these two factors.
Comparison with Other Models
It’s important to discuss this in light of what I’ve observed with the Akash network and similar DePIN projects, which offer compute as a service. They’ve created a market where operators compete to provide the best price for CPU, RAM, Disk, and GPU. Operators simply need to set their price per thread, per GB of RAM, and per GB of Disk, and then run the manager (essentially a K8s server that manages services) with these parameters. Users then select the best operators for their needs when deploying services, effectively bidding to get their services running on the chosen hardware (Akash Calculator).
Our design shares some similarities with these models but also has differences. For example, while we both offer Compute as a Service, we primarily offer Function as a Service (FaaS), where billing is based on invocations. We also offer continuous jobs, running services 24/7, similar to Akash’s main model. However, the pricing models for these two approaches are different, and we can’t make one pricing model work for both.
In Web2 (e.g., GCP, AWS, Azure), "serverless" infrastructure means that your service isn’t running 24/7. Instead, when an invocation occurs (i.e., creating a job call), a server is spawned (or an existing one is allocated) to run the service. After the invocation, the server is shut down, though it might remain available briefly in case of subsequent invocations. In this model, billing is based on CPU and RAM usage during the invocation (you can see their pricing here: AWS Lambda Pricing).
Choosing the Right Model
Before we consider benchmarking, we need to think carefully about which model best suits our use case. I believe the best approach would be for each operator to offer a base compute price, which can be adjusted for individual blueprints. This could result in the best pricing for your service. For FaaS, the AWS model wouldn’t work—a fixed price per millisecond of CPU across different CPUs wouldn’t be fair. Instead, pricing should be defined by the protocol, using a tool that gathers diagnostic data about the operator’s hardware to determine the fairest price based on the current operator market. This information would be configured in the manager and recorded on the operator’s on-chain profile. When work is performed, operators would be paid per millisecond or per block, etc. For instance, if a job takes 200ms to complete but is submitted in the next block, the user would only pay for the 200ms, not the full block time of 6 seconds.
Verification of Job Execution Time
This raises the question of how to verify job execution time. Could operators slow down execution to increase the time and thus earn more? While their price would be lower in this case, it’s still a concern.
Another model could involve paying per job execution, regardless of how long it takes on different hardware. However, this wouldn’t be fair since older, cheaper hardware might take longer to complete the job than modern equipment, yet both would receive the same payment. A potential solution could involve the following:
When a user requests a service from multiple operators and executes Job X on this service, the final TNT payment to the operators would be the sum of their TNT costs for executing Job X.
These proposals are based on research I’ve done on Akash, GCP, and AWS.
I'm closing this task for now, we may need to revisit this again when we have full blueprints.
Overview
As a blueprint developer, I want:
#[benchmark]
MacroThis macro will be added to a function that runs during the blueprint's benchmark mode. It will generate an output consumed by the manager in a specific format. Here's an example:
This will generate a
bench_keygen(&Bencher)
function, which the manager will call during benchmark mode (details on this later):Calling this function multiple times returns the following output to the manager:
Benchmark Mode
Similar to webb-tools/gadget#216 and the introduction of the Registration mode, I propose adding a benchmark mode. This mode will execute the gadget to be benchmarked on the operator machine, outputting the benchmarks for the manager to update its on-chain pricing.
On-Chain Pricing Model
Each operator sets basic pricing for their compute units:
With these inputs on-chain as a global configuration for each operator profile, operators can have an automated way to set pricing for each blueprint based on these benchmarks. By pulling the global configuration for an operator, we can compute the cost to execute one job (e.g., one keygen invocation), making it visible to the user deploying this service on that specific operator.
Checklist