Closed ibrahimiskin closed 6 years ago
I agree. It would be good to have the transform examples coded to export the model along with the transforms, so that can be used with tf Serving.
While i have used the standalone examples of write_saved_transform_from_session , having a tough time figuring out how to do the same with the Beam context.
This piece of code from preprocess function at the link below seems to be addressing the issue I raised. Though, I am still getting the same parsing error. https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/reddit_tft/preprocess.py
# WriteTransformFn writes transform_fn and metadata to fixed subdirectories
# of output_dir, which are given by path_constants.TRANSFORM_FN_DIR and
# path_constants.TRANSFORMED_METADATA_DIR.
_ = (transform_fn
| 'WriteTransformFn' >> tft_beam_io.WriteTransformFn(output_dir))
Here is the code that passes the transformation function to serving input function. https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/reddit_tft/trainer/task.py
transformed_metadata = metadata_io.read_metadata(
args.transformed_metadata_path)
raw_metadata = metadata_io.read_metadata(args.raw_metadata_path)
serving_input_fn = (
input_fn_maker.build_parsing_transforming_serving_input_fn(
raw_metadata,
args.transform_savedmodel,
raw_label_keys=[TARGET_FEATURE_COLUMN]))
export_strategy = tf.contrib.learn.utils.make_export_strategy(
serving_input_fn, exports_to_keep=5,
default_output_alternative_key=None)
For some reason input function created by build_parsing_transforming_serving_input_fn was the source of the problem. I used build_default_transforming_serving_input_fn instead. Everything works beautfully. Sample code below.
def preprocessing_fn(inputs):
"""Preprocess input columns into transformed columns."""
review = inputs[REVIEW_COLUMN]
review_tokens = tft.map(lambda x: tf.string_split(x, DELIMITERS), review)
review_indices = tft.string_to_int(review_tokens, top_k=VOCAB_SIZE)
# Add one for the oov bucket created by string_to_int.
review_weight = tft.tfidf_weights(review_indices, VOCAB_SIZE + 1)
output = {
REVIEW_COLUMN: review_indices,
REVIEW_WEIGHT: review_weight,
LABEL_COLUMN: inputs[LABEL_COLUMN]
}
return output
def preprocess_data(pipeline, train_neg_filepattern, train_pos_filepattern,
test_neg_filepattern, test_pos_filepattern,
transformed_train_filebase, transformed_test_filebase,
transformed_metadata_dir, raw_metadata_dir, output_dir):
"""Transform the data and write out as a TFRecord of Example protos.
Read in the data from the positive and negative examples on disk, and
transform it using a preprocessing pipeline that removes punctuation,
tokenizes and maps tokens to int64 values indices.
Args:
train_neg_filepattern: Filepattern for training data negative examples
train_pos_filepattern: Filepattern for training data positive examples
test_neg_filepattern: Filepattern for test data negative examples
test_pos_filepattern: Filepattern for test data positive examples
transformed_train_filebase: Base filename for transformed training data shards
transformed_test_filebase: Base filename for transformed test data shards
transformed_metadata_dir: Directory where metadata for transformed data should be written
raw_metadata = dataset_metadata.DatasetMetadata(dataset_schema.Schema({
REVIEW_COLUMN: dataset_schema.ColumnSchema(tf.string, [], dataset_schema.FixedColumnRepresentation()),
LABEL_COLUMN: dataset_schema.ColumnSchema(tf.int64, [], dataset_schema.FixedColumnRepresentation()),
}))
"""
input_schema = dataset_schema.from_feature_spec({
REVIEW_COLUMN: tf.FixedLenFeature(shape=[], dtype=tf.string),
LABEL_COLUMN: tf.FixedLenFeature(shape=[], dtype=tf.int64)
})
raw_metadata = dataset_metadata.DatasetMetadata(schema=input_schema)
train_data = pipeline | 'ReadTrain' >> ReadAndShuffleData((train_neg_filepattern, train_pos_filepattern))
test_data = pipeline | 'ReadTest' >> ReadAndShuffleData((test_neg_filepattern, test_pos_filepattern))
(transformed_train_data, transformed_metadata), transform_fn = ((train_data, raw_metadata)
| 'AnalyzeAndTransform' >> beam_impl.AnalyzeAndTransformDataset(preprocessing_fn))
_ = (transform_fn | 'WriteTransformFn' >> tft_beam_io.WriteTransformFn(output_dir))
transformed_test_data, _ = (((test_data, raw_metadata), transform_fn)
| 'Transform' >> beam_impl.TransformDataset())
_ = (transformed_train_data
| 'WriteTrainData' >> tfrecordio.WriteToTFRecord(transformed_train_filebase,
coder=example_proto_coder.ExampleProtoCoder(transformed_metadata.schema)))
_ = (transformed_test_data
| 'WriteTestData' >> tfrecordio.WriteToTFRecord(transformed_test_filebase,
coder=example_proto_coder.ExampleProtoCoder(transformed_metadata.schema)))
_ = (transformed_metadata
| 'WriteTransformedMetadata' >> beam_metadata_io.WriteMetadata(transformed_metadata_dir, pipeline=pipeline))
_ = (raw_metadata
| 'WriteRawMetadata' >> beam_metadata_io.WriteMetadata(raw_metadata_dir, pipeline=pipeline))
def get_experiment_fn(transformed_train_filepattern,
transformed_test_filepattern,
transformed_metadata_dir,
raw_metadata_dir):
def train_and_evaluate(output_dir):
review_column = feature_column.sparse_column_with_integerized_feature(REVIEW_COLUMN, bucket_size=VOCAB_SIZE + 1, combiner='sum')
weighted_reviews = feature_column.weighted_sparse_column(review_column, REVIEW_WEIGHT)
estimator = learn.LinearClassifier(feature_columns=[weighted_reviews],
n_classes=2,
model_dir=output_dir,
config=tf.contrib.learn.RunConfig(save_checkpoints_secs=30))
transformed_metadata = metadata_io.read_metadata(transformed_metadata_dir)
raw_metadata = metadata_io.read_metadata(raw_metadata_dir)
train_input_fn = input_fn_maker.build_training_input_fn(
transformed_metadata,
transformed_train_filepattern,
training_batch_size=TRAIN_BATCH_SIZE,
label_keys=[LABEL_COLUMN])
eval_input_fn = input_fn_maker.build_training_input_fn(
transformed_metadata,
transformed_test_filepattern,
training_batch_size=1,
label_keys=[LABEL_COLUMN])
"""
serving_input_fn = input_fn_maker.build_parsing_transforming_serving_input_fn(
raw_metadata=raw_metadata,
transform_savedmodel_dir=output_dir + '/transform_fn',
raw_label_keys=[],
raw_feature_keys=[REVIEW_COLUMN])
"""
serving_input_fn = input_fn_maker.build_default_transforming_serving_input_fn(
raw_metadata=raw_metadata,
transform_savedmodel_dir=output_dir + '/transform_fn',
raw_label_keys=[],
raw_feature_keys=[REVIEW_COLUMN])
export_strategy = saved_model_export_utils.make_export_strategy(
serving_input_fn,
exports_to_keep=5,
default_output_alternative_key=None)
return tf.contrib.learn.Experiment(
estimator=estimator,
train_steps=TRAIN_NUM_EPOCHS * NUM_TRAIN_INSTANCES / TRAIN_BATCH_SIZE,
eval_steps=NUM_TEST_INSTANCES,
train_input_fn=train_input_fn,
eval_input_fn=eval_input_fn,
export_strategies=export_strategy,
min_eval_frequency=500)
return train_and_evaluate
@ibrahimiskin thanks for tips about tft_beam_io.WriteTransformFn, build_default_transforming_serving_input_fn and dataset_schema.from_feature_spec. I have it working too
@ibrahimiskin could you provide your code? I'm stuck at this example.
@lukashes here we go: https://github.com/ibrahimiskin/gce-ml-engine-tensorflow-sentiment-analysis
@ibrahimiskin great. I did my combined design but it works strange. I'll try to see your variant.
I'm working on fixing the example now
tf.transform 0.4 has been released and it includes updated examples which showcase export and serving paths (eg [1, 2]).
@ibrahimiskin and @KesterTong, should we perhaps close this issue now?
[1] https://github.com/tensorflow/transform/blob/v0.4.0/examples/census_example.py#L317 [2] https://github.com/tensorflow/transform/blob/v0.4.0/examples/sentiment_example.py#L343
Closing this, feel free to reopen if there are any issues.
Hi guys,
I am new to Tensorflow, so bear with me if I am doing something completely wrong :-) I am following the text classification example at https://github.com/tensorflow/transform/blob/master/examples/sentiment_example.py Model development worked as expected. I am working on running the developed model on google ml engine environment.
I added the following lines to "train_and_evaluate" function to export the model
I am receiving the following error upon a classification request for a sample sentence "nice piece of work ." payload looks like this: {"inputs": "nice piece of work ."}
Am I getting this error because the model object is expecting integerized tensors? If so, I attempted to use build_parsing_transforming_serving_input_fn function at https://github.com/tensorflow/transform/blob/master/tensorflow_transform/saved/input_fn_maker.py to perform transformation at run time, it appears that I need a transform_savedmodel_dir that embodies the transformation model with the parsing logic. I figure, this is achieved by using write_saved_transform_from_session at https://github.com/tensorflow/transform/blob/master/tensorflow_transform/saved/saved_transform_io.py
Can you guys share an example code that exports a transform model?