guidance-ai / guidance

A guidance language for controlling large language models.
MIT License
18.81k stars 1.04k forks source link

Bench #843

Closed nopdive closed 4 months ago

nopdive commented 4 months ago

First iteration of adding benchmarks to guidance.

Includes notebook and backing code in guidance.bench module for code reproducibility. guidance is tested on LangChain's Chat Extract dataset. They've done solid work finding a problem with realistic structured JSON output that includes conditionals, nested fields and constraints when checking for JSON schema validation as well.

Dependencies are hidden behind an extra tag bench. This shouldn't impact standard installations.

Test coverage should be high, however I've skipped some tests here as the CI won't be able to run it without an API key to LangChain.

Code is structured to work across multiple GPU containers, but not fully integrated yet. Will have to work on guidance dockerfile later for that.

LMK if more details / changes needed.

codecov-commenter commented 4 months ago

Codecov Report

Attention: Patch coverage is 16.87764% with 197 lines in your changes are missing coverage. Please review.

Project coverage is 59.87%. Comparing base (3377383) to head (00e3a8c). Report is 1 commits behind head on main.

Files Patch % Lines
guidance/bench/_powerlift.py 10.59% 194 Missing :warning:
guidance/bench/_api.py 66.66% 3 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #843 +/- ## ========================================== + Coverage 59.50% 59.87% +0.37% ========================================== Files 59 63 +4 Lines 4334 4571 +237 ========================================== + Hits 2579 2737 +158 - Misses 1755 1834 +79 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Harsha-Nori commented 4 months ago

LGTM