tensorflow / model-remediation

Model Remediation is a library that provides solutions for machine learning practitioners working to create and train models in a way that reduces or eliminates user harm resulting from underlying performance biases.
https://www.tensorflow.org/responsible_ai/model_remediation?hl=en
Apache License 2.0
43 stars 19 forks source link

TF Dataset to TF Examples List in FDW Utils: Can't handle sparse tensors / empty values #33

Open CLSchmitz opened 2 years ago

CLSchmitz commented 2 years ago

Datasets where a column relevant to FDW (specifically, a column to slice on) contains empty values throw errors in one of two places:

This issue is worth addressing because the slicing columns needn't necessarily have been used for training - in fact for ethical reasons often may not have been - and therefore handling missing values isn't required by the training process.

Ali-Maq commented 1 year ago

Thank you for bringing this issue to our attention. You seem to be encountering problems when dealing with sparse tensors or empty values in the FDW Utils. I would like to propose a potential solution for you to handle missing values gracefully in the slicing columns.

Instead of using the .numpy() method directly on the SparseTensor objects or converting them to dense tensors using tf.sparse.to_dense, you can create a custom function to convert the SparseTensor objects into regular lists while handling the empty values.

Here's a function you can use to achieve this: `def sparse_tensor_to_list(sparse_tensor): if tf.is_sparse(sparse_tensor): dense_tensor = tf.sparse.to_dense(sparse_tensor, default_value=None) else: dense_tensor = sparse_tensor

numpy_array = dense_tensor.numpy()

result_list = []
for value in numpy_array:
    if value is None:
        result_list.append(None)
    else:
        result_list.append(value.tolist())
return result_list

` This approach should help you handle the missing values in the slicing columns without encountering the mentioned errors.