microsoft / promptflow

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
https://microsoft.github.io/promptflow/
MIT License
9.35k stars 847 forks source link

Bulk processing of inputs in an LLM - bulk categorization #3394

Open adampolak-vertex opened 4 months ago

adampolak-vertex commented 4 months ago

Currently running prompt flow the "input" is defined as a single "item".

Currently for the classification example it can only classify 1 input at a time.

There needs to be a feature to be able to put in many products at once so that a single prompt can output many categorizations.

To be able to import many inputs at once, and have them all output and linked to original input to make sure that accuracy can be traced.

This way the "cost" of the prompt tokens to explain what must be done can be "amortized" across many inputs.

The same way in an eval you can "bulk" process inputs. The same must be done with a general flow.

image

adampolak-vertex commented 4 months ago

This is important. Is there a solution for this? We cannot use PromptFlow as an endpoint if it cannot do bulk processing.

brynn-code commented 4 months ago

Hi, @adampolak-vertex thanks for reporting this. Currently, promptflow only supports bulk run inputs with each input as a single llm call. To aggregate multiple inputs as one in the prompt, maybe you could try define the input as list type to get a bunch of outputs.

adampolak-vertex commented 4 months ago

Hi, @adampolak-vertex thanks for reporting this. Currently, promptflow only supports bulk run inputs with each input as a single llm call. To aggregate multiple inputs as one in the prompt, maybe you could try define the input as list type to get a bunch of outputs.

Bryan thank you for your feedback. We have looked into including a list type that would take in objects. These are the issues that we have found with this approach:

brynn-code commented 4 months ago

Yeah the problem you mentioned is true, in addition I'd like to mentioned that, if we support batch inputs aggregated in one prompt and send the call, then to keep the relationship between inputs and outputs, we'll have no choice but to read and analyze llm outputs, that may caused some compliance issue which is considered as unsecure for some of the customer.

So, from our side, we still encourage user leverage the bulk run capability if they have no token concern, we won't read/analyze any user inputs and llm outputs.

adampolak-vertex commented 4 months ago

Yeah the problem you mentioned is true, in addition I'd like to mentioned that, if we support batch inputs aggregated in one prompt and send the call, then to keep the relationship between inputs and outputs, we'll have no choice but to read and analyze llm outputs, that may caused some compliance issue which is considered as unsecure for some of the customer.

So, from our side, we still encourage user leverage the bulk run capability if they have no token concern, we won't read/analyze any user inputs and llm outputs.

Yes so I was thinking about that as well and thought about a "id" field for the input which would "unlock" batch processing.

If you had a:

1) "aggregate" module, the user could define the "id" field of an input field, and the number of rows to aggregate at the start of the pipeline or whenever

The PromptFlow endpoints are ingesting JSON so it is not "hurting" the user experience as the regular use case is JSON objects anyway. Adding a "key" to your JSON object that acts as an "id" through the flow is something regular use cases likely have any way for logging or downstream pipelines.

brynn-code commented 3 months ago

Thank you for the idea. Currently our 'aggregation' concept has many limitations, such as this node will not be executed if deploy the flow as an endpoint, so the 'aggregation' and 'unwind' you mentioned is quite a big new things to promptflow and requires detail design about the experience.

We will keep the issue open to see if there is any customers has similar requirements with you and then plan the next step. Thanks for the feedback again, we appreciate it.