Open CLSchmitz opened 2 years ago
Thank you for bringing this issue to our attention. You seem to be encountering problems when dealing with sparse tensors or empty values in the FDW Utils. I would like to propose a potential solution for you to handle missing values gracefully in the slicing columns.
Instead of using the .numpy() method directly on the SparseTensor objects or converting them to dense tensors using tf.sparse.to_dense, you can create a custom function to convert the SparseTensor objects into regular lists while handling the empty values.
Here's a function you can use to achieve this: `def sparse_tensor_to_list(sparse_tensor): if tf.is_sparse(sparse_tensor): dense_tensor = tf.sparse.to_dense(sparse_tensor, default_value=None) else: dense_tensor = sparse_tensor
numpy_array = dense_tensor.numpy()
result_list = []
for value in numpy_array:
if value is None:
result_list.append(None)
else:
result_list.append(value.tolist())
return result_list
` This approach should help you handle the missing values in the slicing columns without encountering the mentioned errors.
Datasets where a column relevant to FDW (specifically, a column to slice on) contains empty values throw errors in one of two places:
If the dataset feature map contains VarLenFeatures, empty sparse tensors are created for them. The error is then in the dataset to example list function: SparseTensor has no attr .numpy().
If they're converted to empty dense tensors using tf.sparse.to_dense, the error is thrown the first time an empty value is read:
list index (0) out of range [while running 'Filter slices by False']
This issue is worth addressing because the slicing columns needn't necessarily have been used for training - in fact for ethical reasons often may not have been - and therefore handling missing values isn't required by the training process.