Open bugsz opened 6 days ago
Attention: Patch coverage is 14.58333%
with 41 lines
in your changes missing coverage. Please review.
Project coverage is 61.03%. Comparing base (
701f2a8
) to head (8fb172a
).
@@ Coverage Diff @@
## main #126 +/- ##
==========================================
- Coverage 61.71% 61.03% -0.69%
==========================================
Files 55 55
Lines 2714 2756 +42
==========================================
+ Hits 1675 1682 +7
- Misses 1039 1074 +35
Files | Coverage Δ | |
---|---|---|
sotopia/cli/benchmark/benchmark.py | 21.17% <14.58%> (-2.27%) |
:arrow_down: |
Closes #
📑 Description
As the title suggests:
sotopia benchmark-all --model-list gpt-4o --model-list gpt-3.5-turbo
, or just go ahead with the default model names.sotopia benchmark-display
. (Seems there is no requirement for pandas so I am not sure how to display in a structured way in CLI)✅ Checks
type/descript
(e.g.feature/add-llm-agents
)ℹ Additional Information