issues
search
mlcommons
/
modelbench
Run safety benchmarks against AI models and view detailed reports showing how well they performed.
https://mlcommons.org/ai-safety/
Apache License 2.0
49
stars
8
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
fully parallel benchmark runner
#441
wpietri
opened
8 hours ago
1
Daily Scheduled Test Failure
#440
github-actions[bot]
opened
12 hours ago
0
Smoke test includes v1 benchmark
#439
wpietri
closed
11 hours ago
3
Render V1 hazards
#438
wpietri
closed
19 hours ago
1
Use Modelgauge v0.6.1
#437
wpietri
closed
1 day ago
1
Use modelgauge v0.6.1
#436
wpietri
closed
1 day ago
2
public-and-private-images
#435
dhosterman
closed
1 day ago
2
Add --benchmark option to job runner
#434
bkorycki
opened
2 days ago
0
Daily Scheduled Test Failure
#433
github-actions[bot]
opened
2 days ago
0
Update SUT list after Together model deprecations.
#432
wpietri
closed
2 days ago
2
remove SUTs no longer provided by Together and move smoke test earlier
#431
dhosterman
closed
2 days ago
4
Daily Scheduled Test Failure
#430
github-actions[bot]
opened
3 days ago
0
Daily Scheduled Test Failure
#429
github-actions[bot]
opened
4 days ago
0
Daily Scheduled Test Failure
#428
github-actions[bot]
opened
5 days ago
0
Daily Scheduled Test Failure
#427
github-actions[bot]
opened
6 days ago
0
add a PIP_EXTRA build arg to install an arbitrary package, optionally
#426
dhosterman
closed
2 days ago
1
Daily Scheduled Test Failure
#425
github-actions[bot]
opened
1 week ago
0
show errors and/or missed items in the reports
#424
wpietri
opened
1 week ago
0
All logs are in json format with --json-logs
#423
bkorycki
opened
1 week ago
0
machine-readable output includes logs
#422
bkorycki
opened
1 week ago
0
output dict in json format for consumption
#421
rogthefrog
closed
1 week ago
1
Always print progress bar
#420
bkorycki
closed
1 week ago
4
Bump mypy from 1.11.1 to 1.11.2
#419
dependabot[bot]
closed
1 day ago
2
Bump scipy from 1.14.0 to 1.14.1
#418
dependabot[bot]
closed
1 day ago
2
Show provenance and uids in benchmark report and jobs listing
#417
wpietri
opened
1 week ago
0
Ensure modelbench always returns an exit code of != 0 if it fails
#416
dhosterman
opened
2 weeks ago
0
Minor output fix
#415
wpietri
closed
2 weeks ago
1
Daily Scheduled Test Failure
#414
github-actions[bot]
opened
2 weeks ago
0
Bump jq from 1.7.0 to 1.8.0
#413
dependabot[bot]
closed
2 weeks ago
1
Run the e2e BM on a set of prompts from vendors
#412
bollacker
opened
3 weeks ago
0
Include a grading function from WS4 in an e2e BM run
#411
bollacker
opened
3 weeks ago
0
Pre v1 cleanup - first PR
#410
wpietri
closed
2 weeks ago
5
Preserve 0.5 benchmark, hazards, and tests
#409
bkorycki
opened
3 weeks ago
0
fixing-docker-image
#408
dhosterman
closed
3 weeks ago
1
Machine-readable progress updates
#407
bkorycki
closed
3 weeks ago
3
fixing-docker-image
#406
dhosterman
closed
3 weeks ago
1
create-docker-image
#405
dhosterman
closed
3 weeks ago
2
Machine-readable progress reports
#404
bkorycki
opened
4 weeks ago
0
Update to latest dependencies
#403
wpietri
closed
4 weeks ago
3
Do a modelbench release
#402
dhosterman
opened
1 month ago
0
Bump mypy from 1.11.0 to 1.11.1
#401
dependabot[bot]
closed
4 weeks ago
2
Bump black from 24.4.2 to 24.8.0
#400
dependabot[bot]
closed
4 weeks ago
2
Skeleton benchmark 1.0
#399
bkorycki
closed
4 weeks ago
4
Add skeleton 1.0 benchmark
#398
bkorycki
opened
1 month ago
0
Bump pytest from 8.3.1 to 8.3.2
#397
dependabot[bot]
closed
1 month ago
1
Bump pip from 24.0 to 24.2
#396
dependabot[bot]
closed
1 month ago
1
Bump pytest from 8.2.0 to 8.3.1
#395
dependabot[bot]
closed
1 month ago
2
Bump mypy from 1.10.0 to 1.11.0
#394
dependabot[bot]
closed
1 month ago
1
Benchmark outcomes record includes cache timestamps
#393
wpietri
opened
1 month ago
0
Benchmark outcomes record
#392
wpietri
closed
1 month ago
5
Next