Model usage analytics feed from Replicate

adampingel commented 4 months ago

Includes scoping metrics beyond basic inference count

Destination of this information is out of scope for this ticket

Depends on https://github.com/granite-cookbooks/pm/issues/16

lindsayentsminger commented 2 months ago

From Anthony:

To start: tokens consumed by model/family to be plotted cumulatively and more importantly by day/week and first derivative by day/week

Later: it would also be ideal if there was a first-order job complexity/depth metric(s), something like average prompt input size maybe, and model calls/session and time spacing, which may provide some indication about whether people are trying simple one-in, one-out apps or something more complicated (retrieval, iterative optimization, more complex agentic things...). That would probably require them to store a lot of per-session info and transfer that to us...even if we don't technically implement this soon it would be good to have this kind of data scoped in the contract (to be handled by Eda & Saleem)

And some screenshots:

image (2)

jolson-ibm commented 2 months ago

Met with the Granite dashboarding team last week. Next step: get a feed going from Replicate to Airtable. Going to start with the Replicate Predict API, which returns the following information for a prediction:

id='a8xc87caxhrm00chmdgstjjhg4' 
model='ibm-granite/granite-8b-code-instruct-128k' 
version='797c070dc871d8fca417d7d188cf050778d7ce21a0318d26711a54207e9ee698' 
status='succeeded' 
input={} 
output=None 
logs=None 
error=None 

metrics={
        'batch_size': 31.874487392490966, 
        'input_token_count': 32, 
        'tokens_per_second': 78.56306752814271, 
        'output_token_count': 27, 
        'predict_time_share': 0.011748832079672044, 
        'predict_time': 0.374852423, 
        'total_time': 28213.854279, 
        'time_to_first_token': 28213.510606488
} 
created_at='2024-08-30T13:02:44.460000Z' 
started_at='2024-08-30T20:52:57.939427Z' 
completed_at='2024-08-30T20:52:58.314279Z' 
urls={
    'stream': 'https://stream-b.svc.sea.v.replicate.net/v1/streams/hf2wax7s4kxcduyhdgk4bjmbbc24icsxz7fw5dt5f2ada2xjfmuq', 
    'get': 'https://api.replicate.com/v1/predictions/a8xc87caxhrm00chmdgstjjhg4', 
    'cancel': 'https://api.replicate.com/v1/predictions/a8xc87caxhrm00chmdgstjjhg4/cancel'
}

Will check with Replicate tech support about the input, output, logs, and error fields.

jolson-ibm commented 2 months ago

Yes, have a code repo to work from now. Will cut over to that ASAP. I think I've got the basic CRUD operations down using the AIrtable SDK, and can query out the prediction metrics from Replicate using the Replicate SDK (both Python) so I don't see a lot of hurdles here, other than getting the compute to run a daily / hourly load job.

jolson-ibm commented 2 months ago

Two issues I am working through:

The Replicate SDK does not allow an argument for ALL USER, only a user
No argument for retrieving metrics between times T1 and T2.

Meeting with Replicate support for clarification.

otherwise, all connectivity Replicate -> Airtable in place.

jolson-ibm commented 2 months ago

Talked with Replicate support yesterday about supporting the changes mentioned above to the SDK. They said they need documentation in order to get a request in to their product team. Documentation created and submitted to internal review.

Will try to get the document out to Replicate this afternoon.

ibm-granite-community / pm

Model usage analytics feed from Replicate #20