tensorflow / model-analysis

Model analysis tools for TensorFlow
Apache License 2.0
1.26k stars 281 forks source link

Validation bug when using feature values as slicing specs #150

Open zywind opened 2 years ago

zywind commented 2 years ago

TFMA version: 0.33

Hi team,

I found a bug in TFMA validation. It happens when I use multiple feature values to specify slicing like so:

eval_config = text_format.Parse("""
  model_specs {
    signature_name: "serving_default"
    preprocessing_function_names: ["transform_features"]
    label_key: "income_bracket"
  }
  metrics_specs {
    metrics {
      class_name: "ExampleCount"
    }
    metrics {
      class_name: "BinaryAccuracy"
      config: '{"name": "accuracy"}'
      per_slice_thresholds {
        slicing_specs {
          feature_values: {key: "race", value: "White"}
          feature_values: {key: "sex", value: "Male"}
        }
        threshold {
          value_threshold {
            lower_bound { value: 0.7 }
          }
        }
      }
    }
  }
  slicing_specs {
    feature_keys: ["race", "sex"]
  }
""", tfma.EvalConfig())

In this example, I specify that I just want to validate accuracy on a particular group: white males. The evaluation will run, but when I load validation results, I get:

tfma.load_validation_result(evaluator.outputs['evaluation'].get()[0].uri)

missing_slices {
  feature_values {
    key: "race"
    value: "White"
  }
  feature_values {
    key: "sex"
    value: "Male"
  }
}

I've tracked down the cause of this bug, and I believe it's the following two lines:

https://github.com/tensorflow/model-analysis/blob/65bb4036a2a296bca228c196a576a355f3c29b1c/tensorflow_model_analysis/evaluators/metrics_validator.py#L213 https://github.com/tensorflow/model-analysis/blob/65bb4036a2a296bca228c196a576a355f3c29b1c/tensorflow_model_analysis/evaluators/metrics_validator.py#L262

Here, you're relying on serialization of slicing_specs for equality comparison. The problem is that feature_values is a map field, and the entry order is random in the serialized string. Sometimes you get race serialized before sex, and sometimes the reverse happens, which is why I get the missing slices message.

I think custom serialization using sorted map fields would fix this bug.

genehwung commented 2 years ago

Thanks a lot for reporting the bug. We are actively working on a fix.

embr commented 2 years ago

This issue should be fixed at head. Let us know if you are still seeing the problem after https://github.com/tensorflow/model-analysis/commit/d31e583a7a8765b6d89264d24a18b3d133b33b89