TF Dataset to TF Examples List in FDW Utils: Can't handle sparse tensors / empty values

tensorflow / model-remediation

Model Remediation is a library that provides solutions for machine learning practitioners working to create and train models in a way that reduces or eliminates user harm resulting from underlying performance biases.

Apache License 2.0

43 stars 20 forks source link

Datasets where a column relevant to FDW (specifically, a column to slice on) contains empty values throw errors in one of two places:

If the dataset feature map contains VarLenFeatures, empty sparse tensors are created for them. The error is then in the dataset to example list function: SparseTensor has no attr .numpy().
If they're converted to empty dense tensors using tf.sparse.to_dense, the error is thrown the first time an empty value is read: list index (0) out of range [while running 'Filter slices by False']

This issue is worth addressing because the slicing columns needn't necessarily have been used for training - in fact for ethical reasons often may not have been - and therefore handling missing values isn't required by the training process.

Thank you for bringing this issue to our attention. You seem to be encountering problems when dealing with sparse tensors or empty values in the FDW Utils. I would like to propose a potential solution for you to handle missing values gracefully in the slicing columns.

Instead of using the .numpy() method directly on the SparseTensor objects or converting them to dense tensors using tf.sparse.to_dense, you can create a custom function to convert the SparseTensor objects into regular lists while handling the empty values.

Here's a function you can use to achieve this: `def sparse_tensor_to_list(sparse_tensor): if tf.is_sparse(sparse_tensor): dense_tensor = tf.sparse.to_dense(sparse_tensor, default_value=None) else: dense_tensor = sparse_tensor

numpy_array = dense_tensor.numpy()

result_list = []
for value in numpy_array:
    if value is None:
        result_list.append(None)
    else:
        result_list.append(value.tolist())
return result_list

` This approach should help you handle the missing values in the slicing columns without encountering the mentioned errors.

tensorflow / model-remediation

TF Dataset to TF Examples List in FDW Utils: Can't handle sparse tensors / empty values #33