Open nvzhihanj opened 6 months ago
Agree with old configs - when running the submission checker on old submissions it makes sense to use the old release of the submission checker. We probably need to maintain only the latest submission and the upcoming submission compatibility in the submission checker. This itself will reduce the submission checker size considerably and this PR does a first iteration for this.
This PR adds a GitHub action for the submission checker changes and runs it against the v4.0 results repository - once this is merged we can safely do any change in the submission checker.
One concrete example of uncaught error is: if users submit runs lower than 600 seconds, the submission checker will report ERROR in the log and return non-zero, but at the end of the log it will still say "the results look OK".
This issue should already be fixed in the master branch by this PR
Proposal for Submission Checker Refactor
Proposal:
Proposed structure:
schema.py
constants.py # Maybe contains scenarios enum?
vX.Y/
config.py
models.py
schema.py
@dataclass
class MLPInferenceSubmission:
version: str
models: List[Model] # Alternatively could be a Dict[str, Model]
seeds: Seeds
audit_test_seeds: Dict[AuditTest, Seeds]
ignored_errors: List[str]
@dataclass
class Model:
name: str
aliases: List[str]
required_in: Dict[SystemCategory, List[Scenario]]
optional_in: Dict[SystemCategory, List[Scenario]]
accuracy_metric: AccuracyMetric # Defined in constants.py, has regex field
accuracy_target: float
performance_sample_count: int
scenario_configs: Dict[Scenario, ScenarioConfig] # With this, it is easy to do things like override the result field for Llama2 since you can use a different string.
audit_tests: List[AuditTest]
@dataclass
class ScenarioConfig:
latency_constraint: int = 0
min_query_count: int = 1
result_field: ResultField = None
@dataclass
class Seeds:
qsl: int
sample_index: int
schedule: int
@dataclass
class ResultField:
id_: str
raw_string: str
Example: v4.0:
models.py:
from ..schema import *
from ..scenarios import *
ResNet50 = Model(
name="resnet",
aliases=["resnet50"],
required_in={SystemCategory.Datacenter: [Scenario.Offline, Scenario.Server],
SystemCategory.Edge: [Scenario.Offline, Scenario.MultiStream, Scenario.SingleStream]},
optional_in=None, # Can be omitted if dataclass definition uses purely kwargs
accuracy_metric=AccuracyMetric.Accuracy,
accuracy_target=76.46 * 0.99,
performance_sample_count=1024,
latency_constraints={Scenario.Server: 15000000},
min_query_counts={Scenario.Offline: 1,
Scenario.Server: 270336,
Scenario.SingleStream: 1024,
Scenario.MultiStream: 270336},
scenario_configs={
Scenario.Offline: ScenarioConfig(min_query_count=1,
result_field=ResultField("result_samples_per_second", "Samples per second")),
Scenario.Server: ScenarioConfig(latency_constraint=15000000,
min_query_count=270336,
result_field=ResultField("result_scheduled_samples_per_sec", "Scheduled samples per second")),
Scenario.SingleStream: ...},
audit_tests=[AuditTest.Test01, AuditTest.Test04, AuditTest.Test05]
)
... etc for all models
config.py:
from ..schema import *
from ..scenarios import *
from .models import *
v4_0_SubmissionConfig = MLPInferenceSubmission(
version="v4.0",
models=[ResNet50, Retinanet, ...],
seeds=Seeds(13281865557512327830, 198141574272810017, 7575108116881280410),
audit_test_seeds={AuditTest.Test05: Seeds(2376919268182438552, 11176391829184272374, 3911940905271271337)},
ignored_errors=[])
For example:
performance_dir_check.py:
class PerformanceDirCheck(Checker):
def fail_if_missing_files(self, ctx):
...
def fail_if_contains_loadgen_error(self, ctx):
...
def fail_if_insufficient_performance_sample_count(self, ctx):
...
Where Checker is a class that has a method that runs all member methods that begin with "failif", and ctx contains the necessary information about the workload being checked.
@pgmpablo157321 to have a look
PR removing the backward compatibility of the submission checker: https://github.com/mlcommons/inference/pull/1677
We should get back to this issue after https://github.com/mlcommons/inference/pull/1677 is merged. That PR would simplify the submission checker a lot.
@nv-ananjappa @nvzhihanj @arjunsuresh @mrmhodak Proposal: Main modules:
Helper modules:
Additional items:
From chatgpt :)
To modularize this script, I’ll refactor the file into logical components, grouping functions, constants, and class definitions into separate modules. Here's how the code can be broken down:
submission_checker/
│
├── __init__.py
├── main.py
├── config.py
├── utils.py
├── constants.py
├── checkers/
│ ├── __init__.py
│ ├── accuracy.py
│ ├── performance.py
│ ├── power.py
│ ├── system_desc.py
│ └── compliance.py
config.py
This module will contain the Config
class that handles configuration for different versions of MLPerf and submission types.
# config.py
from .constants import MODEL_CONFIG
class Config:
"""Select config value by mlperf version and submission type."""
def __init__(self, version, extra_model_benchmark_map, ignore_uncommited=False, skip_power_check=False):
self.base = MODEL_CONFIG.get(version)
self.extra_model_benchmark_map = extra_model_benchmark_map
self.version = version
self.models = self.base["models"]
self.seeds = self.base["seeds"]
self.test05_seeds = self.base["test05_seeds"]
self.accuracy_target = self.base["accuracy-target"]
self.accuracy_delta_perc = self.base["accuracy-delta-perc"]
self.accuracy_upper_limit = self.base.get("accuracy-upper-limit", {})
self.performance_sample_count = self.base["performance-sample-count"]
self.latency_constraint = self.base.get("latency-constraint", {})
self.min_queries = self.base.get("min-queries", {})
self.required = None
self.optional = None
self.ignore_uncommited = ignore_uncommited
self.skip_power_check = skip_power_check
def set_type(self, submission_type):
"""Set required and optional scenarios based on the submission type."""
if submission_type == "datacenter":
self.required = self.base["required-scenarios-datacenter"]
self.optional = self.base["optional-scenarios-datacenter"]
elif submission_type == "edge":
self.required = self.base["required-scenarios-edge"]
self.optional = self.base["optional-scenarios-edge"]
elif submission_type in ["datacenter,edge", "edge,datacenter"]:
self.required = self.base["required-scenarios-datacenter-edge"]
self.optional = self.base["optional-scenarios-datacenter-edge"]
else:
raise ValueError("Invalid system type")
# Other methods of Config go here...
constants.py
This module will contain all constants such as MODEL_CONFIG
, VALID_DIVISIONS
, and REQUIRED_FILES
.
# constants.py
MODEL_CONFIG = {
"v4.0": {
# Same structure you had in your original script
},
"v4.1": {
# Another version configuration
}
}
VALID_DIVISIONS = ["open", "closed", "network"]
VALID_AVAILABILITIES = ["available", "preview", "rdi"]
REQUIRED_PERF_FILES = ["mlperf_log_summary.txt", "mlperf_log_detail.txt"]
OPTIONAL_PERF_FILES = ["mlperf_log_accuracy.json"]
# Add all other constants from the original script here...
utils.py
This module will hold utility functions that are used throughout the codebase, such as directory operations and file listing.
# utils.py
import os
def list_dir(*path):
path = os.path.join(*path)
return [f for f in os.listdir(path) if os.path.isdir(os.path.join(path, f))]
def list_files(*path):
path = os.path.join(*path)
return [f for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))]
def list_files_recursively(*path):
path = os.path.join(*path)
return [
os.path.join(dirpath, file)
for dirpath, dirs, files in os.walk(path)
for file in files
]
# Add any other file/directory-related utility functions here...
checkers/accuracy.py
This module will contain the functions related to checking accuracy.
# checkers/accuracy.py
import os
import re
from ..utils import list_files
from ..constants import ACC_PATTERN, MAX_ACCURACY_LOG_SIZE
def check_accuracy_dir(config, model, path, verbose):
"""Check accuracy files and logs."""
is_valid = False
all_accuracy_valid = True
acc = None
result_acc = {}
hash_val = None
target = config.get_accuracy_target(model)
acc_upper_limit = config.get_accuracy_upper_limit(model)
# Parsing and checking the accuracy
# The rest of the check_accuracy_dir function remains unchanged
return is_valid, result_acc
checkers/performance.py
This module will handle performance checking functions.
# checkers/performance.py
from ..constants import MS_TO_NS, S_TO_MS
from log_parser import MLPerfLog
def check_performance_dir(config, model, path, scenario_fixed, division, system_json, has_power=False):
"""Check performance directory and logs."""
is_valid = False
rt = {}
# Load MLPerf log and validate
mlperf_log = MLPerfLog(os.path.join(path, "mlperf_log_detail.txt"))
# Continue with performance check logic...
return is_valid, rt
checkers/power.py
This module will deal with checking power directories and logs.
# checkers/power.py
from log_parser import MLPerfLog
import os
def check_power_dir(power_path, ranging_path, testing_path, scenario_fixed, power_res_ranging, power_res_testing, config):
"""Check power metrics from power logs."""
is_valid = True
power_metric = 0
# Check and validate power files...
return is_valid, power_metric
checkers/system_desc.py
This module will include functions for checking system descriptions.
# checkers/system_desc.py
import os
import json
from ..constants import SYSTEM_DESC_REQUIRED_FIELDS
def check_system_desc_id(fname, systems_json, submitter, division, version, skip_meaningful_fields_emptiness_check):
"""Check system description file."""
is_valid = True
# Perform system description validation...
return is_valid
checkers/compliance.py
This module will handle compliance-related checks.
# checkers/compliance.py
import os
from ..utils import list_files
def check_compliance_dir(compliance_dir, model, scenario, config, division, system_json, name):
"""Check compliance test directories and logs."""
compliance_perf_pass = True
# Perform compliance checks...
return compliance_perf_pass
main.py
Finally, the main module will orchestrate the checking process by utilizing all the modularized functions and components.
# main.py
import argparse
from config import Config
from utils import list_dir, list_files
from checkers.accuracy import check_accuracy_dir
from checkers.performance import check_performance_dir
from checkers.power import check_power_dir
from checkers.system_desc import check_system_desc_id
from checkers.compliance import check_compliance_dir
def get_args():
"""Parse commandline arguments."""
parser = argparse.ArgumentParser()
parser.add_argument("--input", required=True, help="submission directory")
parser.add_argument("--version", default="v4.1", choices=["v4.0", "v4.1"], help="mlperf version")
parser.add_argument("--submitter", help="filter to submitter")
parser.add_argument("--csv", default="summary.csv", help="csv file with results")
args = parser.parse_args()
return args
def main():
args = get_args()
config = Config(args.version, "model_mapping.json")
# Rest of the logic for running the checks using modularized functions...
if __name__ == "__main__":
main()
main.py
coordinates everything by calling the appropriate functions from the modules.This modularization makes the codebase easier to manage, debug, and extend for future versions of the MLPerf submission checker.
WIth the increasing number of benchmarks and checks, we have found several issues with the submission checker (https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py):
We would like to propose several changes to the submission checker, including but not limited to:
One concrete example of uncaught error is: if users submit runs lower than 600 seconds, the submission checker will report ERROR in the log and return non-zero, but at the end of the log it will still say "the results look OK".
@pgmpablo157321 FYI. We are discussing how to better improve the submission_checker, and will continue to update in this thread.