Open sekulicd opened 9 months ago
w.r.t tokens
The best way to represent the atomic resource is to represent the underlying compute, rather the length of the prompt of the request and response
It should be a combination of time (minutes of billing) and "weight" of the RAM on the total available, to run the cumulative transformer blocks that are served to consumers.
Think of Ethereum Gas consumption as an example; you estimate the GPU compute in advance (with static analysis you can infer the cost of each Operator that maps to a collection of instructions for a model
Ie.
w.r.t tokens
The best way to represent the atomic resource is to represent the underlying compute, rather the length of the prompt of the request and response
User cost (what a user pays for one interaction) is a sum of:
While our service fee remains static, the compute cost varies based on the 'weight' of the user's request. For instance:
For LLM models, the cost can be tied to the 'number of tokens'. For image models, it relates to the resolution of the image. For audio models, it depends on the duration or number of minutes.
To make this practical, we could: Calculate the GPU cost (combination of time and "weight" of the RAM) for each prem-service Clearly display the pricing for each category, whether that's per 1K tokens, per specific image resolution, or per minute of audio. My argument is this: GPU costs can be approximated with a user's 'quota' and we need to simplify the computational costs in a way that users can easily estimate their charges based on their intended requests. This clarity will not only improve user experience of both admin user and regular users but i think this abstraction can be helpful in development of these features.
For eg. in case of Etherium price is decided based on 'Gas used' which is abstraction similar to 'number of tokens/img resolution/minutes of audio' and Gas price(inn our case this price is fixed and it will directly reflect GPU computation cost)
@tiero
Project Overview
The goal is to develop a platform that integrates with an existing system to safeguard running services by enforcing users to provide a valid API key. The UI/UX should draw inspiration from the ChatGPT API Key Management platform, focusing on credit-based payments without a subscription option.
Target Users
prem-app
as an admin dashboard. In this role, they should be able to create API keys (e.g., without constraints), view usage, etc.Core Features
Identity Management
prem-app
dashboard.API Key Management
Billing
Usage/Analytics
Integration
prem-app
) should be enhanced to enable the admin to create API keys and view Billing/Usage analytics.Main Flow
prem-service
based on API Key constraints which include:prem-service
using the API key. The platform checks if the API key exists, whether the related user has enough balance, and whether the rate and usage limits for the desired service path are adhered to.@tiero @filopedraz