I'm having issues specifying the features to include/exclude when visualizing stats in TFDV. It seems like the allowlist_features and denylist_features require a tensorflow_data_validation.types.FeaturePath object, which took a bit to figure out how to construct. This doesn't seem that user friendly -- was it intended to allow a list of strings to be passed?
Code to reproduce
I can reproduce the problem in the public colab example. In the "Compute and Visualize Statistics" section of the above notebook, update the visualize_statistics call to be:
tfdv.visualize_statistics(train_stats, denylist_features=['pickup_community_area']). The first feature shouldn't exist in the visualized example (if I'm calling this correctly).
Workaround code
To make this work, I have to manually construct a tensorflow_data_validation.types.FeaturePath object. Perhaps it would be better to do the filter comparison on each feature's path string?
# Show string name of feature
first_feat = train_stats.datasets[0].features[0]
print(first_feat.path)
# Construct necessary object to make `allowlist_feature` filter work
from tensorflow_data_validation import types
print(types.FeaturePath.from_proto(first_feat.path))
# docs-infra: no-execute
tfdv.visualize_statistics(train_stats, allowlist_features=[types.FeaturePath.from_proto(first_feat.path)])
Overview
I'm having issues specifying the features to include/exclude when visualizing stats in TFDV. It seems like the
allowlist_features
anddenylist_features
require atensorflow_data_validation.types.FeaturePath
object, which took a bit to figure out how to construct. This doesn't seem that user friendly -- was it intended to allow a list of strings to be passed?Code to reproduce
I can reproduce the problem in the public colab example. In the "Compute and Visualize Statistics" section of the above notebook, update the
visualize_statistics
call to be:tfdv.visualize_statistics(train_stats, denylist_features=['pickup_community_area'])
. The first feature shouldn't exist in the visualized example (if I'm calling this correctly).Workaround code
To make this work, I have to manually construct a
tensorflow_data_validation.types.FeaturePath
object. Perhaps it would be better to do the filter comparison on each feature'spath
string?