Closed kengz closed 8 years ago
This is actually possible to do now with FeatureColumns (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/feature_column.py#L34) Specifically see sparse_column_with_keys.
Let us know how it works (you can use tracker for https://github.com/tensorflow/tensorflow/).
Can we have 2 functions commonly used for data cleaning:
fillna()
andLabelEncoder()
, but implement a Multi-column version for each that works directly on the entire data frameX
rather than column-by-column.MultiFillna(X, str_val='NA', num_val=0)
would perform column-wisefillna()
on X using the stated/default values, 'NA' for string columns and 0 for numerical columns. This is especially useful when we have X with a mix-match of str/number columns and wish to dofillna()
in one go.MultiLabelEncoder
is especially useful for applyingfit_transform
to each column with mentioned header, and itsreverse_transform
would apply the inverse. This can be saved with the model atclassifier.save(path)
, and restored for direct usage withclassifier.restore(path)
.For example, for the titanic data, one can do prediction by loading the model with the
MultiLabelEncoder
, and inputx=['male', 22, 1, 7.25]
, then dopredict(x)
that internally uses the encoder to transformx
.