issues
search
mlcommons
/
modelbench
Run safety benchmarks against AI models and view detailed reports showing how well they performed.
https://mlcommons.org/ai-safety/
Apache License 2.0
62
stars
11
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bump the prod-deps group with 4 updates
#714
dependabot[bot]
opened
9 hours ago
1
More runtime improvements
#713
wpietri
closed
1 day ago
1
Tweak the grading function
#712
rogthefrog
closed
2 days ago
1
More runtime improvements
#711
wpietri
closed
3 days ago
1
Hopefully improve reliability and debugging output a bit.
#710
wpietri
closed
3 days ago
1
New new grading function
#709
rogthefrog
opened
3 days ago
1
Peter's emergency Mistral run
#708
wpietri
opened
3 days ago
0
Register Phi 3.5 moe SUT + Add "instruct" to Phi UIDs
#707
bkorycki
closed
3 days ago
2
Azure plugin + Phi 3.5 mini SUT
#706
bkorycki
closed
4 days ago
1
Gemini safety settings on
#705
bkorycki
closed
4 days ago
1
Update bands with Kurt's newest code
#704
rogthefrog
opened
4 days ago
0
Hopefully final-er standards.
#703
wpietri
closed
4 days ago
1
Fix annotator cache bug in benchmark runner
#702
bkorycki
closed
5 days ago
1
Set up gemini with (safety on = BLOCK_LOW_AND_ABOVE)
#701
wpietri
opened
5 days ago
0
Hopefully final standards
#700
wpietri
closed
5 days ago
1
Register Llama 3.1 405b Instruct SUT
#699
bkorycki
closed
6 days ago
1
Retry anthropic 429 more doggedly. Add a little more to the journal.
#698
wpietri
closed
6 days ago
1
Bump the prod-deps group with 2 updates
#697
dependabot[bot]
closed
6 days ago
1
Use actual heldback prompts in official tests
#696
bkorycki
closed
1 week ago
1
Add new claude.
#695
wpietri
closed
1 week ago
1
Bump aiohttp from 3.10.10 to 3.10.11 in the pip group
#694
dependabot[bot]
closed
6 days ago
1
More consistent retrying
#693
wpietri
closed
1 week ago
1
Include NVIDIA SUTs in benchmark
#692
wpietri
opened
1 week ago
2
Apply ensemble updates nov 13
#691
bkorycki
closed
1 week ago
1
Practice/heldback prompts switch
#690
bkorycki
closed
1 week ago
1
Add more checks to consistency checker
#689
bkorycki
opened
1 week ago
0
Stand up mistral SUT adapter for ministral 8b instruct
#688
rogthefrog
opened
1 week ago
2
productionize gpt SUT
#687
wpietri
opened
1 week ago
0
A Huggingface endpoint broke; this uses the replacement endpoint.
#686
wpietri
closed
1 week ago
1
Bump the prod-deps group with 5 updates
#685
dependabot[bot]
closed
1 week ago
1
Trying to get Dependabot to group the PRs in one lump on a weekly basis.
#684
wpietri
closed
1 week ago
1
add nvidia-nim-api plugin to plugins/
#683
zijiachen95
closed
4 days ago
1
add nvidia-nim-api plugin to plugins/
#682
zijiachen95
closed
1 week ago
1
add nvidia-nim-api plugin to plugins/
#681
zijiachen95
closed
1 week ago
1
Journal consistency checker
#680
bkorycki
closed
1 week ago
2
More elaborate private tests, saner public tests.
#679
wpietri
closed
1 week ago
1
Add Microsoft Phi 3.5 MoE
#678
wpietri
opened
1 week ago
0
Add Microsoft Phi 3.5 mini
#677
wpietri
opened
1 week ago
0
Make official Llama SUT
#676
wpietri
opened
1 week ago
0
Add proper anthropic SUT
#675
wpietri
opened
1 week ago
0
Bump tqdm from 4.66.5 to 4.67.0
#674
dependabot[bot]
closed
1 week ago
2
Bump tomli from 2.0.2 to 2.1.0
#673
dependabot[bot]
closed
1 week ago
2
Add some basic journal documentation
#672
wpietri
closed
2 weeks ago
1
Final final practice calibration (with ws3-llama-guard-3-ruby v0.3)
#671
wpietri
closed
2 weeks ago
1
Practice prompt calibration
#670
wpietri
closed
2 weeks ago
1
Update to latest ws3 voting strategy
#669
bkorycki
closed
3 weeks ago
1
update grading function per October 2024 spec
#668
rogthefrog
closed
1 week ago
4
Update ensemble join method
#667
bkorycki
opened
3 weeks ago
0
Add persona and persona*hazard breakdown to benchmark grading functions
#666
rogthefrog
opened
3 weeks ago
0
operational improvements, round 3
#665
wpietri
opened
3 weeks ago
0
Next