Enhance C2D (provider Fees, enviroments)

alexcos20 commented 2 years ago

In the current v4 setup, C2D resources, prices, flow are very unclear. How do you define compute resources? How can you have multiple environments (cpu, ram , disk setups) ?

Proposal:

C2D should have multiple environments (tech speaking, namespaces) which are exported by op-service in the root endpoint. Each enviroment has it's own characteristics. The environments object should look like::

[
{
 "id": "xxx",
 "cpuNumber": 2, //number of cpus allocated for a job
"cpuType": "xxx", //describe the cpu type, optional
 "gpuNumber": 0, //number of cpus allocated for a job
"gpuType": "xxx", //describe the cpu type, optional
"ramGB": 1, // amount of RAM in GB
"diskGB":  2 , // disk amount in GB
"priceMin": 2.3 , //price per minute
"desc":  "This is one env"  // description of enviroment
"currentJobs": 1// no of running jobs
"maxJobs": 10// no of max simultaneous running jobs
},
{
 "id": "env2",
 "cpuNumber": 12, //number of cpus allocated for a job
"cpuType": "xxx", //describe the cpu type, optional
 "gpuNumber": 2, //number of cpus allocated for a job
"gpuType": "Nvidia Tesla", //describe the cpu type, optional
"ramGB": 64, // amount of RAM in GB
"diskGB":  10 , // disk amount in GB
"priceMin": 13.1 , //price per minute
"desc":  "This is a gpu env"  // description of enviroment
"currentJobs": 1. // no of running jobs
"maxJobs": 10. // no of max simultaneous running jobs
}
]

Provider will expose this envs in it's root endpoint as well, adding providerFeeToken (defined in Provider env, because each network will have it's own providerFeeToken address (ie: USDT address on mainnet != polygon)
When publishing a compute dataset, we DO NOT specify cpu, ram, etc , only the serviceEndpoint
When publishing an algorithm, specify minimum requirements (cpu, ram, etc) , so only selected environments can be used

Consume flow:

ocean.js/ocean.py should expose provider compute env, so a consumer can choose what env wants to use for his compute job (nice market UI)
when calling Provider initialize endpoint, we add the desired env (only for compute jobs). Provider will calculate providerFees for that env and sends them in response (address, token, amount, data, signature (vrs)). Provider must take into account the timeout parameter, so it will multiply c2d.envs.priceMin with ddo.service.timeout/60
consumer approves fees (erc20template will be the spender)
consumer calls startCompute (need to refactor the arguments now. Will accept 4 objects: dataset, algoritm, additionalDatasets, computeEnv)
Provider checks the order and starts the compute job

~~Downside:~~

~~I will pay 1 DT and 10 USD to test my algo using cheap cpu.~~
~~Once my algo is tested, I will have to buy again 1 DT and pay another providerFee in order to run in a high performance env~~

In V4.1, we will separate the process in two, meaning that you would buy the DT to have access to data for a specific period, and purchase separate compute env for different period (IE: I buy access to data for one month. I will buy 10 mins of cheapest env to test my algo , then buy an expensive c2d env to run my algo several times. So I will pay once for the data, and multiple fees for compute)

How to separate data access & provider resources:

in startOrder function, have 2 events:

event OrderStarted(
    address indexed consumer,
    address payer,
    uint256 amount,
    uint256 serviceIndex,
    uint256 timestamp,
    address indexed publishMarketAddress,
    uint256 blockNumber
);
event ProviderFees(
    address indexed providerFeeAddress,
    address indexed providerFeeToken, 
    uint256 providerFeeAmount,
    bytes providerData,
    uint8 v, 
    bytes32 r, 
    bytes32 s,
    uint256 validUntil
);

notice the extra validUntil parameter in ProviderFees event

have another function, which allows access to a dataset using a previous order, but with paid fees again

/**
 * @dev reuseOrder
 *      called by payer or consumer having a valid order, but with expired provider access
 *      Pays the provider fees again, but it will not require a new datatoken payment
 *      Requires previous approval of provider fees.
 * @param orderTxId previous valid order
 * @param _providerFees provider feees
 */

The logic for consume is the following:

check if consumer has a previous OrderStarted which is valid
- if not, use startOrder to spent a datatoken and pay providerFees
- if yes, check if ProviderFees.validUntil is expired
- - if not, then you have access to the dataset
- - if yes, call reuseOrder(previousOrderTx, newProviderFees) - reuse the same order, but pay again the provider fees, for a new validUntil
Provider logic is the following, given a txId received from the consumer (can be a startOrder or a reuseOrder tx):
check providerFees event in the txid
- if invalid, refuse access
detect if it's a reuseOrder
- if not, continue with checking the OrderStarted event (as we do now)
- if yes, fetch OrderReused.orderTxId, and use that as txid. Continue checking the OrderStarted event (as we do now)

kremalicious commented 2 years ago

For some UI context, tried to make the resource usage more clear, so exposing the hardcoded defaults we set. This is what we have in market v4 within the publish form, so that would be the use case to extend that, and in the end multiple services with multiple prices could be created in UI based on different resource types:

alexcos20 commented 2 years ago

The desired compute environment is going to be selected by consumer, before ordering the service. It doesn't make any sense to have this enforced by the publisher.

It's up to the consumer. If I'm short on money, I will choose an environment with 1 cpu, and by paying less I'm assuming a longer duration of the job. And vice-versa, if I'm in a hurry and the algorithm can use multiple cpus, I will pay for an environment with 128 cpus.

w1kke commented 2 years ago

There can be big problems if the consumer chooses the compute environment instead of the provider - especially for more complex machine learning algorithms.

In the case of deep learning an algorithm can be run on CPUs but in that case it can take 45x longer. So it is possible but then instead of running a day it runs 45 days and a blocked CPU for a long time is also not really that cheap. Other deep learning algorithms need a certain amount of GPU memory or they won't work properly. A configuration chosen by the consumer can be not feasibly. As there is no way to revoke a job after the consumer paid this would create problems that can only be solved manually e.g. by the provider sending back the funds to the consumer.

alexcos20 commented 2 years ago

We could define in the DDO a list of minimal requirements and display only envs that fullfill that minimal requirements

oceanprotocol / pm

Enhance C2D (provider Fees, enviroments) #127