Closed nopdive closed 4 months ago
Attention: Patch coverage is 16.87764%
with 197 lines
in your changes are missing coverage. Please review.
Project coverage is 59.87%. Comparing base (
3377383
) to head (00e3a8c
). Report is 1 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
guidance/bench/_powerlift.py | 10.59% | 194 Missing :warning: |
guidance/bench/_api.py | 66.66% | 3 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
LGTM
First iteration of adding benchmarks to guidance.
Includes notebook and backing code in
guidance.bench
module for code reproducibility.guidance
is tested on LangChain's Chat Extract dataset. They've done solid work finding a problem with realistic structured JSON output that includes conditionals, nested fields and constraints when checking for JSON schema validation as well.Dependencies are hidden behind an extra tag
bench
. This shouldn't impact standard installations.Test coverage should be high, however I've skipped some tests here as the CI won't be able to run it without an API key to LangChain.
Code is structured to work across multiple GPU containers, but not fully integrated yet. Will have to work on guidance dockerfile later for that.
LMK if more details / changes needed.