Incorrect values for accuracy delta percentage

pgmpablo157321 commented 1 month ago

When checking for the allowed delta percentage between test01 and baseline accuracy, the submission checker sets it to 1 or 0.1 based on the name of the benchmark

https://github.com/mlcommons/inference/blob/28144996310f5303a09355d55f238786737dc346/tools/submission/submission_checker.py#L2474-L2478

I think this behavior is incorrect for stable-diffusion-xl, where the CLIP_SCORE metric has a tolerance of 0.2% and the FID_SCORE of 2% based on the allowed ranges.

The allowed delta percentage for those cases should be 0.2% and 2% respectively @arjunsuresh Can you confirm this is the correct behavior?

arjunsuresh commented 1 month ago

@pgmpablo157321 yes, that's correct. We need to use two different deltas for CLIP and FID scores.

mrmhodak commented 1 month ago

WG comments: Only change FID in TEST01 - that increases the range All existing results passing current checker are valid

mlcommons / inference

Incorrect values for accuracy delta percentage #1792