Closed bugsz closed 12 hours ago
By the way currently I am using assert False (so the current pytest is definitely not passing) to see the output. However, I do not know how to check if there is a reasoning part. Does anyone have an idea?
@XuhuiZhou Could you help check this? I think this is basically a prompting issue? Maybe by changing the description of the goal dimension, it should work better?
@bugsz @ProKil Okay I fixed this bug, basically, the original instruction is a bit ambiguous. But they somehow magically work when they stick together
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 62.01%. Comparing base (
8d9b9be
) to head (269b4f2
). Report is 2 commits behind head on main.
@@ Coverage Diff @@
## main #123 +/- ##
==========================================
+ Coverage 60.03% 62.01% +1.98%
==========================================
Files 47 55 +8
Lines 2402 2733 +331
==========================================
+ Hits 1442 1695 +253
- Misses 960 1038 +78
Files | Coverage Ξ | |
---|---|---|
sotopia/envs/evaluators.py | 91.07% <100.00%> (+0.62%) |
:arrow_up: |
tests/envs/test_evaluators.py | 100.00% <100.00%> (ΓΈ) |
@bugsz Could you check if this fixes your problem?
π Description
I provide a test case for the issue mentions in #89. Specifically this is done by adding a dummy evaluator with only one
goal
evaluation dimension, and add a new option for theresponse_format
in evaluator. Besides, I use the same format as in real Sotopia simulation in testing, which makes the test case aligned with the actual evaluation.