lastmile-ai / aiconfig

AIConfig is a config-based framework to build generative AI applications.
https://aiconfig.lastmileai.dev
MIT License
942 stars 77 forks source link

Batch inference #401

Open saqadri opened 10 months ago

saqadri commented 10 months ago

Similar to AI Workflows, we should enable local batch inference.

Let's discuss in this issue what the API should look like and what data format the results are created as.

Request from @2timesjay.

jonathanlastmileai commented 10 months ago

Spoke offline with @Ankush-lastmile. I think it would be incredibly helpful here to be very concretely use-case driven. In the case of eval, I found it extremely helpful to start with the use case and user story (https://docs.google.com/document/d/17tjQgLDmAqyq26XJx4GqVYS73K73GykvhywEiFCozps/edit#bookmark=id.5uogvi8dxrv4).

Proposal for a use case to zoom in on:

I'm building a haiku-generating AIConfig and I have some example haiku types and some idea of how to evaluate the output quality. (This is an eval use case, but batch inference is a key implementation component). I want to explore not only the different outputs I get for different Haiku types, but how the output quality changes depending on the AIConfig itself. e.g. Llama vs. GPT4.

Ankush-lastmile commented 10 months ago

Created a draft pr #408 rfc .

From my understanding of Jacob's conversation, the primary use case for batch execution can be viewed as an exploratory use case, wherein the user is privy to both inputs and outputs. This suggests that batch execution is closely tied to an evaluation or an eval interface.

+1 the specific use case from @2timesjay would be great to work with.

Considering a suitable interface that provides users with access to batch execution for AIConfig, here's my proposal:

Input Interface:

Users can provide 4 inputs

  1. The AIConfig
  2. Parameters
  3. The number of executions for the same AIConfig
  4. Execution options, essentially a run-config that specifies how to execute the AIConfig. This can be options such as the following:
    • run all - run all prompts
    • run last - run the last prompt

Output Interface

A key question here is determining who or what will consume the output?

Proposal:

  1. We can persist the outputs for each execution using the config.save(save_with_outputs=True) method. This separates execution and evaluation as each "execution" is saved as an AIConfig artifact.

Another alternative is to return the data in memory that includes the outputs corresponding to each execution. This would probably conform to a dictionary mapping. Something like this

{
     run_1: { 
          prompt: "prompt1"
          output: "output"
     }
}
jonathanlastmileai commented 10 months ago

@Ankush-lastmile , I'm glad we're use-case driven and totally agree with this:

From my understanding of Jacob's conversation, the primary use case for batch execution can be viewed as an exploratory use case, wherein the user is privy to both inputs and outputs. This suggests that batch execution is closely tied to an evaluation or an eval interface.

With that in mind, let's make sure not to write two libraries. If the initial scope of "batch execution" means running AIConfig in a loop with a small number of examples, that is essentially what the eval library already does. There are a few clear features here that are missing from the eval interface, but they can be easily added. Let's dedup efforts.

The features:

saqadri commented 10 months ago

We discussed this in person today, @Ankush-lastmile when you have a PR please link it to this issue to provide visibility into how we are implementing this. @2timesjay we should have this available EOD Monday or Tuesday

Ankush-lastmile commented 9 months ago

Hi @2timesjay

We have integrated batch execution functionality in to the Python SDK, with the recent merge of pr #469. This new interface streamlines the process of batch executing an AIConfig, allowing you to input an AIConfig alongside a list of parameter dictionaries. The output will be a collection of tuples, each containing ExecuteResult, resolved completion params, and the used parameters.

To give you a clearer picture of how this feature works in practice, I've prepared a demonstration video. You can watch it here to see the batch execution interface in action and get a better understanding of its capabilities:

https://github.com/lastmile-ai/aiconfig/assets/141073967/e071361f-637f-442b-86c5-edec68b5eba7

Looking forward to your feedback!

2timesjay commented 9 months ago

Thanks, will take a look!

2timesjay commented 9 months ago

https://github.com/lastmile-ai/aiconfig/pull/553/files shows an example where batch results are inconsistent with individual results. It looks like the responses are not being correctly joined on to the requests, and an arbitrary one is instead being repeated for every answer. some state in the config runtime may be at fault.

Also, batch is significantly slower than serial runs.

Ankush-lastmile commented 9 months ago

Thank you for highlighting these issues, The discrepancy between batch and individual results, as well as the slower performance of batch executions, are indeed serious concerns.

I'm looking into the this and will get back to you with an update soon.

Ankush-lastmile commented 9 months ago

Hi @2timesjay,

I've just landed PR #566, addressing the inconsistency in batch results.

I tested the AIConfig using your provided with the iPython notebook and observed no notable discrepancies in processing speeds. Could you please rebase on main and re-run the notebook on your end and update us on any significant speed variations you encounter?

Thanks!