Open saqadri opened 10 months ago
Spoke offline with @Ankush-lastmile. I think it would be incredibly helpful here to be very concretely use-case driven. In the case of eval, I found it extremely helpful to start with the use case and user story (https://docs.google.com/document/d/17tjQgLDmAqyq26XJx4GqVYS73K73GykvhywEiFCozps/edit#bookmark=id.5uogvi8dxrv4).
Proposal for a use case to zoom in on:
I'm building a haiku-generating AIConfig and I have some example haiku types and some idea of how to evaluate the output quality. (This is an eval use case, but batch inference is a key implementation component). I want to explore not only the different outputs I get for different Haiku types, but how the output quality changes depending on the AIConfig itself. e.g. Llama vs. GPT4.
Created a draft pr #408 rfc .
From my understanding of Jacob's conversation, the primary use case for batch execution can be viewed as an exploratory use case, wherein the user is privy to both inputs and outputs. This suggests that batch execution is closely tied to an evaluation or an eval interface.
+1 the specific use case from @2timesjay would be great to work with.
Considering a suitable interface that provides users with access to batch execution for AIConfig, here's my proposal:
Users can provide 4 inputs
run all
- run all promptsrun last
- run the last promptA key question here is determining who or what will consume the output?
Proposal:
Another alternative is to return the data in memory that includes the outputs corresponding to each execution. This would probably conform to a dictionary mapping. Something like this
{
run_1: {
prompt: "prompt1"
output: "output"
}
}
@Ankush-lastmile , I'm glad we're use-case driven and totally agree with this:
From my understanding of Jacob's conversation, the primary use case for batch execution can be viewed as an exploratory use case, wherein the user is privy to both inputs and outputs. This suggests that batch execution is closely tied to an evaluation or an eval interface.
With that in mind, let's make sure not to write two libraries. If the initial scope of "batch execution" means running AIConfig in a loop with a small number of examples, that is essentially what the eval library already does. There are a few clear features here that are missing from the eval interface, but they can be easily added. Let's dedup efforts.
The features:
We discussed this in person today, @Ankush-lastmile when you have a PR please link it to this issue to provide visibility into how we are implementing this. @2timesjay we should have this available EOD Monday or Tuesday
Hi @2timesjay
We have integrated batch execution functionality in to the Python SDK, with the recent merge of pr #469. This new interface streamlines the process of batch executing an AIConfig, allowing you to input an AIConfig alongside a list of parameter dictionaries. The output will be a collection of tuples, each containing ExecuteResult, resolved completion params, and the used parameters.
To give you a clearer picture of how this feature works in practice, I've prepared a demonstration video. You can watch it here to see the batch execution interface in action and get a better understanding of its capabilities:
https://github.com/lastmile-ai/aiconfig/assets/141073967/e071361f-637f-442b-86c5-edec68b5eba7
Looking forward to your feedback!
Thanks, will take a look!
https://github.com/lastmile-ai/aiconfig/pull/553/files shows an example where batch results are inconsistent with individual results. It looks like the responses are not being correctly joined on to the requests, and an arbitrary one is instead being repeated for every answer. some state in the config runtime may be at fault.
Also, batch is significantly slower than serial runs.
Thank you for highlighting these issues, The discrepancy between batch and individual results, as well as the slower performance of batch executions, are indeed serious concerns.
I'm looking into the this and will get back to you with an update soon.
Hi @2timesjay,
I've just landed PR #566, addressing the inconsistency in batch results.
I tested the AIConfig using your provided with the iPython notebook and observed no notable discrepancies in processing speeds. Could you please rebase on main and re-run the notebook on your end and update us on any significant speed variations you encounter?
Thanks!
Similar to AI Workflows, we should enable local batch inference.
Let's discuss in this issue what the API should look like and what data format the results are created as.
Request from @2timesjay.