tensorflow / transform

Input pipeline framework
Apache License 2.0
984 stars 214 forks source link

running sentiment_example.py model as a server #15

Closed ibrahimiskin closed 6 years ago

ibrahimiskin commented 7 years ago

Hi guys,

I am new to Tensorflow, so bear with me if I am doing something completely wrong :-) I am following the text classification example at https://github.com/tensorflow/transform/blob/master/examples/sentiment_example.py Model development worked as expected. I am working on running the developed model on google ml engine environment.

I added the following lines to "train_and_evaluate" function to export the model

from tensorflow.contrib.learn.python.learn.utils import input_fn_utils
from tensorflow.contrib.layers import create_feature_spec_for_parsing

feature_spec = create_feature_spec_for_parsing(train_input_fn)
serving_input_fn = input_fn_utils.build_parsing_serving_input_fn(feature_spec)
estimator.export_savedmodel(job_dir, serving_input_fn)

I am receiving the following error upon a classification request for a sample sentence "nice piece of work ." payload looks like this: {"inputs": "nice piece of work ."}

{
  "error": "Prediction failed: Exception during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details=\"Could not parse example input, value: 'nice piece of work .'\n\t [[Node: ParseExample/ParseExample = ParseExample[Ndense=0, Nsparse=2, Tdense=[], _output_shapes=[[-1,2], [-1,2], [-1], [-1], [2], [2]], dense_shapes=[], sparse_types=[DT_INT64, DT_FLOAT], _device=\"/job:localhost/replica:0/task:0/cpu:0\"](_recv_input_example_tensor_0, ParseExample/ParseExample/names, ParseExample/ParseExample/sparse_keys_0, ParseExample/ParseExample/sparse_keys_1)]]\")"
}

Am I getting this error because the model object is expecting integerized tensors? If so, I attempted to use build_parsing_transforming_serving_input_fn function at https://github.com/tensorflow/transform/blob/master/tensorflow_transform/saved/input_fn_maker.py to perform transformation at run time, it appears that I need a transform_savedmodel_dir that embodies the transformation model with the parsing logic. I figure, this is achieved by using write_saved_transform_from_session at https://github.com/tensorflow/transform/blob/master/tensorflow_transform/saved/saved_transform_io.py

Can you guys share an example code that exports a transform model?

mariobriggs commented 7 years ago

I agree. It would be good to have the transform examples coded to export the model along with the transforms, so that can be used with tf Serving.

While i have used the standalone examples of write_saved_transform_from_session , having a tough time figuring out how to do the same with the Beam context.

ibrahimiskin commented 7 years ago

This piece of code from preprocess function at the link below seems to be addressing the issue I raised. Though, I am still getting the same parsing error. https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/reddit_tft/preprocess.py

# WriteTransformFn writes transform_fn and metadata to fixed subdirectories
  # of output_dir, which are given by path_constants.TRANSFORM_FN_DIR and
  # path_constants.TRANSFORMED_METADATA_DIR.
  _ = (transform_fn
       | 'WriteTransformFn' >> tft_beam_io.WriteTransformFn(output_dir))

Here is the code that passes the transformation function to serving input function. https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/reddit_tft/trainer/task.py

transformed_metadata = metadata_io.read_metadata(
        args.transformed_metadata_path)
    raw_metadata = metadata_io.read_metadata(args.raw_metadata_path)
    serving_input_fn = (
        input_fn_maker.build_parsing_transforming_serving_input_fn(
            raw_metadata,
            args.transform_savedmodel,
            raw_label_keys=[TARGET_FEATURE_COLUMN]))
    export_strategy = tf.contrib.learn.utils.make_export_strategy(
        serving_input_fn, exports_to_keep=5,
        default_output_alternative_key=None)
ibrahimiskin commented 7 years ago

For some reason input function created by build_parsing_transforming_serving_input_fn was the source of the problem. I used build_default_transforming_serving_input_fn instead. Everything works beautfully. Sample code below.

def preprocessing_fn(inputs):
    """Preprocess input columns into transformed columns."""
    review = inputs[REVIEW_COLUMN]
    review_tokens = tft.map(lambda x: tf.string_split(x, DELIMITERS), review)
    review_indices = tft.string_to_int(review_tokens, top_k=VOCAB_SIZE)
    # Add one for the oov bucket created by string_to_int.
    review_weight = tft.tfidf_weights(review_indices, VOCAB_SIZE + 1)

    output = {
        REVIEW_COLUMN: review_indices,
        REVIEW_WEIGHT: review_weight,
        LABEL_COLUMN: inputs[LABEL_COLUMN]
    }
    return output

def preprocess_data(pipeline, train_neg_filepattern, train_pos_filepattern,
                   test_neg_filepattern, test_pos_filepattern,
                   transformed_train_filebase, transformed_test_filebase,
                   transformed_metadata_dir, raw_metadata_dir, output_dir):
    """Transform the data and write out as a TFRecord of Example protos.
    Read in the data from the positive and negative examples on disk, and
    transform it using a preprocessing pipeline that removes punctuation,
    tokenizes and maps tokens to int64 values indices.

    Args:
    train_neg_filepattern: Filepattern for training data negative examples
    train_pos_filepattern: Filepattern for training data positive examples
    test_neg_filepattern: Filepattern for test data negative examples
    test_pos_filepattern: Filepattern for test data positive examples
    transformed_train_filebase: Base filename for transformed training data shards
    transformed_test_filebase: Base filename for transformed test data shards
    transformed_metadata_dir: Directory where metadata for transformed data should be written

    raw_metadata = dataset_metadata.DatasetMetadata(dataset_schema.Schema({
        REVIEW_COLUMN: dataset_schema.ColumnSchema(tf.string, [], dataset_schema.FixedColumnRepresentation()),
        LABEL_COLUMN: dataset_schema.ColumnSchema(tf.int64, [], dataset_schema.FixedColumnRepresentation()),
    }))
    """

    input_schema = dataset_schema.from_feature_spec({
        REVIEW_COLUMN: tf.FixedLenFeature(shape=[], dtype=tf.string),
        LABEL_COLUMN: tf.FixedLenFeature(shape=[], dtype=tf.int64)
    })
    raw_metadata = dataset_metadata.DatasetMetadata(schema=input_schema)

    train_data = pipeline | 'ReadTrain' >> ReadAndShuffleData((train_neg_filepattern, train_pos_filepattern))
    test_data = pipeline | 'ReadTest' >> ReadAndShuffleData((test_neg_filepattern, test_pos_filepattern))

    (transformed_train_data, transformed_metadata), transform_fn = ((train_data, raw_metadata)
      | 'AnalyzeAndTransform' >> beam_impl.AnalyzeAndTransformDataset(preprocessing_fn))

    _ = (transform_fn | 'WriteTransformFn' >> tft_beam_io.WriteTransformFn(output_dir))

    transformed_test_data, _ = (((test_data, raw_metadata), transform_fn)
      | 'Transform' >> beam_impl.TransformDataset())

    _ = (transformed_train_data
      | 'WriteTrainData' >> tfrecordio.WriteToTFRecord(transformed_train_filebase,
          coder=example_proto_coder.ExampleProtoCoder(transformed_metadata.schema)))

    _ = (transformed_test_data
      | 'WriteTestData' >> tfrecordio.WriteToTFRecord(transformed_test_filebase,
          coder=example_proto_coder.ExampleProtoCoder(transformed_metadata.schema)))

    _ = (transformed_metadata
      | 'WriteTransformedMetadata' >> beam_metadata_io.WriteMetadata(transformed_metadata_dir, pipeline=pipeline))

    _ = (raw_metadata
      | 'WriteRawMetadata' >> beam_metadata_io.WriteMetadata(raw_metadata_dir, pipeline=pipeline))

def get_experiment_fn(transformed_train_filepattern,
                           transformed_test_filepattern,
                           transformed_metadata_dir,
                           raw_metadata_dir):
    def train_and_evaluate(output_dir):
        review_column = feature_column.sparse_column_with_integerized_feature(REVIEW_COLUMN, bucket_size=VOCAB_SIZE + 1, combiner='sum')
        weighted_reviews = feature_column.weighted_sparse_column(review_column, REVIEW_WEIGHT)

        estimator = learn.LinearClassifier(feature_columns=[weighted_reviews],
                                         n_classes=2,
                                         model_dir=output_dir,
                                         config=tf.contrib.learn.RunConfig(save_checkpoints_secs=30))

        transformed_metadata = metadata_io.read_metadata(transformed_metadata_dir)
        raw_metadata = metadata_io.read_metadata(raw_metadata_dir)

        train_input_fn = input_fn_maker.build_training_input_fn(
            transformed_metadata,
            transformed_train_filepattern,
            training_batch_size=TRAIN_BATCH_SIZE,
            label_keys=[LABEL_COLUMN])

        eval_input_fn = input_fn_maker.build_training_input_fn(
            transformed_metadata,
            transformed_test_filepattern,
            training_batch_size=1,
            label_keys=[LABEL_COLUMN])

        """
        serving_input_fn = input_fn_maker.build_parsing_transforming_serving_input_fn(
            raw_metadata=raw_metadata,
            transform_savedmodel_dir=output_dir + '/transform_fn',
            raw_label_keys=[],
            raw_feature_keys=[REVIEW_COLUMN])
        """

        serving_input_fn = input_fn_maker.build_default_transforming_serving_input_fn(
            raw_metadata=raw_metadata,
            transform_savedmodel_dir=output_dir + '/transform_fn',
            raw_label_keys=[],
            raw_feature_keys=[REVIEW_COLUMN])

        export_strategy = saved_model_export_utils.make_export_strategy(
            serving_input_fn,
            exports_to_keep=5,
            default_output_alternative_key=None)

        return tf.contrib.learn.Experiment(
            estimator=estimator,
            train_steps=TRAIN_NUM_EPOCHS * NUM_TRAIN_INSTANCES / TRAIN_BATCH_SIZE,
            eval_steps=NUM_TEST_INSTANCES,
            train_input_fn=train_input_fn,
            eval_input_fn=eval_input_fn,
            export_strategies=export_strategy,
            min_eval_frequency=500)
    return train_and_evaluate
mariobriggs commented 7 years ago

@ibrahimiskin thanks for tips about tft_beam_io.WriteTransformFn, build_default_transforming_serving_input_fn and dataset_schema.from_feature_spec. I have it working too

lukashes commented 7 years ago

@ibrahimiskin could you provide your code? I'm stuck at this example.

ibrahimiskin commented 7 years ago

@lukashes here we go: https://github.com/ibrahimiskin/gce-ml-engine-tensorflow-sentiment-analysis

lukashes commented 7 years ago

@ibrahimiskin great. I did my combined design but it works strange. I'll try to see your variant.

KesterTong commented 6 years ago

I'm working on fixing the example now

katsiapis commented 6 years ago

tf.transform 0.4 has been released and it includes updated examples which showcase export and serving paths (eg [1, 2]).

@ibrahimiskin and @KesterTong, should we perhaps close this issue now?

[1] https://github.com/tensorflow/transform/blob/v0.4.0/examples/census_example.py#L317 [2] https://github.com/tensorflow/transform/blob/v0.4.0/examples/sentiment_example.py#L343

katsiapis commented 6 years ago

Closing this, feel free to reopen if there are any issues.