Open adampingel opened 4 months ago
From Anthony:
To start: tokens consumed by model/family to be plotted cumulatively and more importantly by day/week and first derivative by day/week
Later: it would also be ideal if there was a first-order job complexity/depth metric(s), something like average prompt input size maybe, and model calls/session and time spacing, which may provide some indication about whether people are trying simple one-in, one-out apps or something more complicated (retrieval, iterative optimization, more complex agentic things...). That would probably require them to store a lot of per-session info and transfer that to us...even if we don't technically implement this soon it would be good to have this kind of data scoped in the contract (to be handled by Eda & Saleem)
And some screenshots:
Met with the Granite dashboarding team last week. Next step: get a feed going from Replicate to Airtable. Going to start with the Replicate Predict API, which returns the following information for a prediction:
id='a8xc87caxhrm00chmdgstjjhg4'
model='ibm-granite/granite-8b-code-instruct-128k'
version='797c070dc871d8fca417d7d188cf050778d7ce21a0318d26711a54207e9ee698'
status='succeeded'
input={}
output=None
logs=None
error=None
metrics={
'batch_size': 31.874487392490966,
'input_token_count': 32,
'tokens_per_second': 78.56306752814271,
'output_token_count': 27,
'predict_time_share': 0.011748832079672044,
'predict_time': 0.374852423,
'total_time': 28213.854279,
'time_to_first_token': 28213.510606488
}
created_at='2024-08-30T13:02:44.460000Z'
started_at='2024-08-30T20:52:57.939427Z'
completed_at='2024-08-30T20:52:58.314279Z'
urls={
'stream': 'https://stream-b.svc.sea.v.replicate.net/v1/streams/hf2wax7s4kxcduyhdgk4bjmbbc24icsxz7fw5dt5f2ada2xjfmuq',
'get': 'https://api.replicate.com/v1/predictions/a8xc87caxhrm00chmdgstjjhg4',
'cancel': 'https://api.replicate.com/v1/predictions/a8xc87caxhrm00chmdgstjjhg4/cancel'
}
Will check with Replicate tech support about the input, output, logs, and error fields.
Yes, have a code repo to work from now. Will cut over to that ASAP. I think I've got the basic CRUD operations down using the AIrtable SDK, and can query out the prediction metrics from Replicate using the Replicate SDK (both Python) so I don't see a lot of hurdles here, other than getting the compute to run a daily / hourly load job.
Two issues I am working through:
Meeting with Replicate support for clarification.
otherwise, all connectivity Replicate -> Airtable in place.
Talked with Replicate support yesterday about supporting the changes mentioned above to the SDK. They said they need documentation in order to get a request in to their product team. Documentation created and submitted to internal review.
Will try to get the document out to Replicate this afternoon.
Includes scoping metrics beyond basic inference count
Destination of this information is out of scope for this ticket
Depends on https://github.com/granite-cookbooks/pm/issues/16