Open adampolak-vertex opened 4 months ago
This is important. Is there a solution for this? We cannot use PromptFlow as an endpoint if it cannot do bulk processing.
Hi, @adampolak-vertex thanks for reporting this. Currently, promptflow only supports bulk run inputs with each input as a single llm call. To aggregate multiple inputs as one in the prompt, maybe you could try define the input as list
type to get a bunch of outputs.
Hi, @adampolak-vertex thanks for reporting this. Currently, promptflow only supports bulk run inputs with each input as a single llm call. To aggregate multiple inputs as one in the prompt, maybe you could try define the input as
list
type to get a bunch of outputs.
Bryan thank you for your feedback. We have looked into including a list type that would take in objects. These are the issues that we have found with this approach:
We will now need two services to "wrap" the PromptFlow endpoint
Let me know if any of the issues I mentioned can be handled by alternative means.
Yeah the problem you mentioned is true, in addition I'd like to mentioned that, if we support batch inputs aggregated in one prompt and send the call, then to keep the relationship between inputs and outputs, we'll have no choice but to read and analyze llm outputs, that may caused some compliance issue which is considered as unsecure for some of the customer.
So, from our side, we still encourage user leverage the bulk run capability if they have no token concern, we won't read/analyze any user inputs and llm outputs.
Yeah the problem you mentioned is true, in addition I'd like to mentioned that, if we support batch inputs aggregated in one prompt and send the call, then to keep the relationship between inputs and outputs, we'll have no choice but to read and analyze llm outputs, that may caused some compliance issue which is considered as unsecure for some of the customer.
So, from our side, we still encourage user leverage the bulk run capability if they have no token concern, we won't read/analyze any user inputs and llm outputs.
Yes so I was thinking about that as well and thought about a "id" field for the input which would "unlock" batch processing.
If you had a:
1) "aggregate" module, the user could define the "id" field of an input field, and the number of rows to aggregate at the start of the pipeline or whenever
The PromptFlow endpoints are ingesting JSON so it is not "hurting" the user experience as the regular use case is JSON objects anyway. Adding a "key" to your JSON object that acts as an "id" through the flow is something regular use cases likely have any way for logging or downstream pipelines.
Thank you for the idea. Currently our 'aggregation' concept has many limitations, such as this node will not be executed if deploy the flow as an endpoint, so the 'aggregation' and 'unwind' you mentioned is quite a big new things to promptflow and requires detail design about the experience.
We will keep the issue open to see if there is any customers has similar requirements with you and then plan the next step. Thanks for the feedback again, we appreciate it.
Currently running prompt flow the "input" is defined as a single "item".
Currently for the classification example it can only classify 1 input at a time.
There needs to be a feature to be able to put in many products at once so that a single prompt can output many categorizations.
To be able to import many inputs at once, and have them all output and linked to original input to make sure that accuracy can be traced.
This way the "cost" of the prompt tokens to explain what must be done can be "amortized" across many inputs.
The same way in an eval you can "bulk" process inputs. The same must be done with a general flow.