zywind commented 2 years ago

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow Model Analysis): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Chromium OS 12.0_pre408248_p20201125-r7
TensorFlow Model Analysis installed from (source or binary): binary
TensorFlow Model Analysis version (use command below): 0.26
Python version: 3.7.9
Jupyter Notebook version: 6.1.3

Exact command to reproduce: First method using TFX


import tensorflow_model_analysis as tfma
from tfx.components import Evaluator

eval_config = tfma.EvalConfig( model_specs=[ tfma.ModelSpec(label_key='toxicity', signature_name='serve_tfexample') ], metrics_specs=[ tfma.MetricsSpec( metrics=[ tfma.MetricConfig(class_name='ExampleCount'), tfma.MetricConfig(class_name='BinaryAccuracy') ] ) ], slicing_specs=[ tfma.SlicingSpec(), tfma.SlicingSpec(feature_keys=['race']), ] )

evaluator = Evaluator( examples=example_gen.outputs['examples'], schema=schema_gen.outputs['schema'], model=trainer.outputs['model'], fairness_indicator_thresholds=[0.3, 0.5, 0.7], eval_config=eval_config, )

context.run(evaluator) eval_result_uri = evaluator.outputs['evaluation'].get()[0].uri eval_result = tfma.load_eval_result(eval_result_uri)

from tensorflow_model_analysis.addons.fairness.view import widget_view widget_view.render_fairness_indicator(eval_result=eval_result)

Second method without TFX

tfma_eval_result_path = './bert/tfma_eval_result' import tensorflow_model_analysis.addons.fairness.post_export_metrics.fairness_indicators from google.protobuf import text_format

metrics_callbacks = [ tfma.post_export_metrics.fairness_indicators(thresholds=[0.3, 0.5, 0.7]), ]

eval_config = tfma.EvalConfig( model_specs=[ tfma.ModelSpec(label_key=LABEL) ], metrics_specs=[ tfma.MetricsSpec( metrics=[ tfma.MetricConfig(class_name='ExampleCount'), tfma.MetricConfig(class_name='BinaryAccuracy') ] ) ], slicing_specs=[

An empty slice spec means the overall slice, i.e. the whole dataset.

tfma.SlicingSpec(),
tfma.SlicingSpec(feature_keys=['race']),

] )

eval_shared_model = tfma.default_eval_shared_model( eval_saved_model_path='./checkpoints', add_metrics_callbacks=metrics_callbacks, eval_config=eval_config)

eval_result = tfma.run_model_analysis( eval_config=eval_config, eval_shared_model=eval_shared_model, data_location=validate_tf_file, output_path=tfma_eval_result_path)

from tensorflow_model_analysis.addons.fairness.view import widget_view widget_view.render_fairness_indicator(eval_result=eval_result)



### Describe the problem
I am trying to display fairness metrics with a TF2 based model, but for some reason, the fairness metrics (false discovery rate, false positive rate, etc.) do not show up in eval_result or in the fairness indicator widget (see screenshot below). This happened both when I use TFX's Evaluator component and when I run TFMA directly. Is it a bug or am I doing something wrong?

![Screen Shot 2021-08-04 at 3 35 17 PM](https://user-images.githubusercontent.com/3665540/128243915-b874ac4c-202d-468d-8c78-f261edeee1e6.png)

kumarpiyush commented 2 years ago

Hi zywind, Looks like TF2 based models are incompatible with metrics_callback. Could you try using Fairness Indicators as a metric (https://www.tensorflow.org/responsible_ai/fairness_indicators/guide)? In your case:

metrics_specs=[
  tfma.MetricsSpec(
    metrics=[
      tfma.MetricConfig(class_name='ExampleCount'),
      tfma.MetricConfig(class_name='BinaryAccuracy'),
      tfma.MetricConfig(class_name='FairnessIndicators', config='{"thresholds":[0.3,0.5,0.7]}'),
    ]
  )
]

This syntax should work with both TFX and non-TFX examples you've pasted.

Also, could you point me to the guide you have been using for Fairness Indicators so that I could capture it there?

zywind commented 2 years ago

Thank you @kumarpiyush for the information. I tried your suggestion but I get a different error: IndexError: arrays used as indices must be of integer (or boolean) type [while running 'ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/ComputePerSlice/ComputeUnsampledMetrics/CombinePerSliceKey/WindowIntoDiscarding']

This might be because I'm using an outdated TFMA (v0.26)? My company will update our TFMA soon so I will test this again later.

As for the guide, I was just following the standard guide here: https://www.tensorflow.org/tfx/guide/fairness_indicators#compute_fairness_metrics. It suggested using the add_metrics_callbacks parameter. For TFX, there is no guide. I just found the fairness_indicator_thresholds parameter in the Evaluator's API. Adding a guide for TFX's Evaluator would be very useful.

DirkjanVerdoorn commented 2 years ago

@zywind are you by any chance applying transformations to your label? I experienced a similar issue when transforming string labels to one-hot encoded array's. In your case, you are using the output from the exampleGen component rather than the transform component. If your labels are strings, then your error is probably generated by the following line of code, coming from the one_hot function in tensorflow_model_analysis/metrics/metric_util.py: tensor = np.delete(np.eye(target.shape[-1] + 1)[tensor], -1, axis=-1)

To solve this, you should take the output of the transform component rather than the exampleGen component. Again, I don't know your full situation, but this might be a possible reason your pipeline is failing.

zywind commented 2 years ago

@DirkjanVerdoorn Thanks for the suggestion. The labels are integer types so that's really not the problem.

zywind commented 2 years ago

Just to follow up on this, now our environment is updated to TFMA 0.31 and I can confirm that the fairness_indicator_thresholds parameter in Evaluator still doesn't work, but @kumarpiyush's method worked. @kumarpiyush You may want to update the guide here: https://www.tensorflow.org/tfx/guide/fairness_indicators#compute_fairness_metrics

tensorflow / model-analysis

Fairness indicator metrics do not show up #138

System information

An empty slice spec means the overall slice, i.e. the whole dataset.