instructor-ai / instructor

structured outputs for llms
https://python.useinstructor.com/
MIT License
8.29k stars 660 forks source link

Batch processing with re-submit when validation errors #1092

Open thonspeciebio opened 1 month ago

thonspeciebio commented 1 month ago

I have a complex input file of semi structured data that I want to re-structure using instructor. It does a great job with the direct chat completion calls, returning my Parsed Pydantic model just fine.

But when I create a BatchJob, it seems that it simply runs one job and let Pydantic model validate the model and just returns the failed instances, but does not attempt to re-ask the job, sending it back as another job.

Are there any plans to implement re-ask mechanisms for batch jobs? Batch jobs are soo much faster and cheaper for me, but the pydantic validation errors are holding me back...

Maybe someone has some hints for me on re-using some of the code already used for the regular chat completion?

jxnl commented 1 month ago

how would that work, 30 out of 500 fail, do you want us to then make 30 more requests when you load and notice they failed? 30 requests now, or submit a second batch job?

thonspeciebio commented 1 month ago

Yup, exactly how I have implemented it now myself. I collect the invalid lines from the BatchJob and re-submit those until they work (or when max-try is exhaustive.