Closed nroberts1 closed 1 year ago
@nroberts1 , transformed_name is a small helper function that appends _xf to each feature name. This is used in the transform module to differentiate transformed features from raw features. This is not a bug but a practice that is followed in tfx. More information can be found in :-
1)Building Machine Learning Pipelines by Hannes Hapke, Catherine Nelson
On the tfx tutorial page under Tutorials - Transform - Process data (advanced), the tranform_fn does not use this rename approach, suggesting it should not be necessary:
def preprocessing_fn(inputs):
"""Preprocess input columns into transformed columns."""
# Since we are modifying some features and leaving others unchanged, we
# start by setting `outputs` to a copy of `inputs.
outputs = inputs.copy()
As TFMA outside of TFX works fine using the code below then the encompassing framework (TFX) should not alter the requirements of this / the transform process. I do think this should at the least be considered as an improvement if not a bug.
eval_shared_model = tfma.default_eval_shared_model(
eval_saved_model_path=model_path,
tags=[tf.saved_model.SERVING]
)
train_output_path = os.path.join(self.pipe_config.get_pipeline_path(), 'Evaluator',
'train')
os.makedirs(train_output_path)
train_data_path = os.path.join(self.pipe_config.get_pipeline_path(),
'ImportExampleGen', 'examples', item_number, 'train', '*')
eval_output_path = os.path.join(self.pipe_config.get_pipeline_path(), 'Evaluator', 'eval')
os.makedirs(eval_output_path)
eval_data_path = os.path.join(self.pipe_config.get_pipeline_path(),
'ImportExampleGen', 'examples', item_number, 'eval', '*')
test_output_path = os.path.join(self.pipe_config.get_pipeline_path(), 'Evaluator', 'test')
os.makedirs(test_output_path)
test_data_path = os.path.join(self.pipe_config.get_pipeline_path(),
'ImportExampleGen', 'examples', item_number, 'test', '*')
train_result = tfma.run_model_analysis(
eval_shared_model=eval_shared_model,
eval_config=eval_config,
data_location=train_data_path,
output_path=train_output_path,
file_format='tfrecords',
slice_spec=[tfma.slicer.SingleSliceSpec()]
)
eval_result = tfma.run_model_analysis(
eval_shared_model=eval_shared_model,
eval_config=eval_config,
data_location=eval_data_path,
output_path=eval_output_path,
file_format='tfrecords',
slice_spec=[tfma.slicer.SingleSliceSpec()]
)
test_result = tfma.run_model_analysis(
eval_shared_model=eval_shared_model,
eval_config=eval_config,
data_location=test_data_path,
output_path=test_output_path,
file_format='tfrecords',
slice_spec=[tfma.slicer.SingleSliceSpec()]
)
Using the TFX material led me to this problem and I expect others will follow. The error message does not say 'rename the inputs during the transform process' and I lost many development hours narrowing down the issue before reporting it. It will be common for people to disregard TFX as an option if faced with this and similar issues. This paints TFX as a product in poor light if this is the level of acceptable functionality.
You should be allowed to rename the inputs. I haven't looked closely, but I wonder if updating to use TFMA with the newly added TFT support will work. Just change your config so that you have the following:
eval_config = tfma.EvalConfig(
model_specs=[
tfma.ModelSpec(
signature_name='serving_default',
label_key='tips',
preprocessing_function_names=['tft_layer'])
],
...
)
@nroberts1 , please confirm whether you are satisfied with the workaround from the previous comment by @mdreves.
No, this isn't a workaround from what I can tell. The issue is for names "not" to have to be renamed and even though TFMA works the need is for TFX's evaluation step to work so you can benefit from TFX's complete ML pipeline functionality.
The workaround is to rename the inputs, then it all works. The improvement would be preferably get it to work when inputs are not renamed, or failing that making the need to rename clear in the documentation and providing a clear error or warning message when they are not i.e. "input are not renamed, this can cause issues during the TFX evalution step"
Sorry for the late follow-up. I think that it makes sense and we need to add more documentation and early-warnings about the shape of input / transformed-inputs.
@mdreves Do you have any ideas on this?
TFMA is moving away from supporting using the keras model directly for inference. Instead all the inputs need to be defined via signatures. The transformations themselves also need to be defined using signatures. There are examples for how to export a model with proper signatures in the penguin example:
With this setup, the TFMA code would look something like the following:
eval_config = tfma.EvalConfig(
model_specs=[
tfma.ModelSpec(
signature_name='serving_default',
label_key='<your label>',
preprocessing_function_names=['transform_features'])
],
...
)
When set up with this configuration there shouldn't be any issues with what names you use for the inputs.
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
This issue was closed due to lack of activity after being marked stale for past 7 days.
It appears the Evaluator component has an issue when I don't rename the transformed inputs.
I've managed to reproduce the error with the smallest of changes. If you run the colab for the Keras TFX tutorial from here - https://www.tensorflow.org/tfx/tutorials/tfx/components_keras Everything works fine all the way through to the Evaluator.
In the colab I've changed the transform_name function in the constants file to:
Here's a link to the edited colab - https://colab.research.google.com/drive/1KYcdmHxfY9URgAIKI1_rx4DVty5d-f5o?usp=sharing
As the transform starts with outputs = {} and then adds to this using
this can't be caused by left over additional fields from the transform function inputs.
Run the edited colab and the Evaluator fails with:
When running this locally with the BeamDagRunner the error is as below, I'm guessing the underlying issie will be the same?