TFServing Issues with Seq2Seq (Preprocessing, Assets, C++)

ghost commented 6 years ago

Good morning everyone,

I created a seq2seq model with attention for a chatbot and need to deploy it to a production server using tensorflow serving. I followed the basic tutorials available and read the documentation / open issues here, but it seems to me that there is no straightforward way to deploy the model.

At the moment I can make predictions in the following way, first I define the graph:

tf.reset_default_graph()
# Build the graph
train_graph = tf.Graph()
# Set the graph to default to ensure that it is ready for training
with train_graph.as_default():

    # Load the model inputs    
    input_data, targets, lr, target_sequence_length, max_target_sequence_length, source_sequence_length, keep_prob  = get_model_inputs()

    # Create the training and inference logits
    training_decoder_output, inference_decoder_output = seq2seq_model(input_data, 
                                                                      targets, 
                                                                      lr, 
                                                                      target_sequence_length, 
                                                                      max_target_sequence_length, 
                                                                      source_sequence_length,
                                                                      len(word2int),
                                                                      len(word2int),
                                                                      encoding_embedding_size, 
                                                                      decoding_embedding_size, 
                                                                      rnn_size, 
                                                                      num_layers
                                                                     )    

    # Create tensors for the training logits and inference logits
    training_logits = tf.identity(training_decoder_output.rnn_output, 'logits')
    inference_logits = tf.identity(inference_decoder_output.sample_id, name='predictions')

    # Create the weights for sequence_loss
    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):

        global_step = tf.Variable(0, name='global_step',trainable=False)

        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients, global_step= global_step)

    #tf.summary.scalar('loss', cost)
    training_summary = tf.summary.scalar("training_loss", cost)
    validation_summary = tf.summary.scalar("validation_loss", cost)
    tf.summary.scalar('learning_rate', lr)
    summary_op = tf.summary.merge_all()

Then I load the model checkpoints and pass the test data through a preprocessing pipeline:

with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    saver = tf.train.Saver()

    saver.restore(sess,"./model_logs/model.ckpt-8750")
    max_line_length = 10

    def question_to_seq(question, word2int):
        return [word2int.get(word, word2int['<UNK>']) for word in question.split()]

    input_question = "This is a test"
    cleaned_text = data_cleaning(input_question)
    input_question = question_to_seq(cleaned_text, word2int)

    text = input_question + [word2int["<PAD>"]] * (max_line_length - len(input_question))
    a = [text]*batch_size

    answer_logits = sess.run(inference_logits, {input_data: [text]*batch_size, 
                                      target_sequence_length: [len(text)]*batch_size, 
                                      source_sequence_length: [len(text)]*batch_size,
                                      keep_prob: 1})[0]

    pad_q = word2int["<PAD>"]
    pad_a = word2int["<PAD>"]

    print('Question')
    print('  Word Ids:      {}'.format([i for i in input_question if i != pad_q]))
    print('  Input Words: {}'.format([int2word[i] for i in input_question if i != pad_q]))

    print('\nAnswer')
    print('  Word Ids:      {}'.format([i for i in answer_logits if i != pad_a]))
    print('  Response Words: {}'.format([int2word[i] for i in answer_logits if i != pad_a]))

I think that the main problems to make this model work with tensorflow serving can be summarised as following:

Data preprocessing
Assets
C++ custom servable

Based on my current research I came to the following conclusions:

1. Data Processing

The "ideal" way to handle this should be by using tf.Transform (https://github.com/tensorflow/serving/issues/663), but as indicated in the post it doesn't support many operations such as lowercase or regex and is therefore not suitable for such text based models.

As tf.Transform doesn't work I can think about 2 alternatives to solve the issue:

Do all the preprocessing in the client
Somehow include the preprocessing in the graph I guess that doing the preprocessing in the client is not the ideal option and that it should happen in the graph, but I can't really get my head around that. Does it mean that I would move all the preprocessing functions in this section and retrain the whole model?

tf.reset_default_graph()
# Build the graph
train_graph = tf.Graph()
# Set the graph to default to ensure that it is ready for training
with train_graph.as_default():

    # Load the model inputs    
    input_data, targets, lr, target_sequence_length, max_target_sequence_length, source_sequence_length, keep_prob  = get_model_inputs()

    ------------------------------------------------------------------------------
    Input preprocessing functions, batching, vocab creation etc. goes all here ??

    ------------------------------------------------------------------------------

    # Create the training and inference logits
    training_decoder_output, inference_decoder_output = seq2seq_model(input_data, 
                                                                      targets, 
                                                                      lr, 
                                                                      target_sequence_length, 
                                                                      max_target_sequence_length, 
                                                                      source_sequence_length,
                                                                      len(word2int),
                                                                      len(word2int),
                                                                      encoding_embedding_size, 
                                                                      decoding_embedding_size, 
                                                                      rnn_size, 
                                                                      num_layers
                                                                     )    

........................................

Does this approach make sense or am I completely missing something?

2. Assets

The second issue that needs to be solved is how to use assets, in this case the two dictionaries int2word and word2int which are required for the model. I managed to store them by modifying the following code that was only working for strings (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/saved_model/saved_model_half_plus_two.py)

def _write_assets(assets_directory, assets_filename, vocab):
    """Writes asset files to be used with SavedModel for half plus two.
        Args:
        assets_directory: The directory to which the assets should be written.
        assets_filename: Name of the file to which the asset contents should be
            written.
        Returns:
        The path to which the assets file was written.
    """
    if not file_io.file_exists(assets_directory):
        file_io.recursive_create_dir(assets_directory)

    path = os.path.join(tf.compat.as_bytes(assets_directory), tf.compat.as_bytes(assets_filename))
    file_io.write_string_to_file(path, json.dumps(vocab))
    return path

Then I transformed the model into protobuf format to make it ready for serving.

with tf.Session(graph=train_graph) as sess:
    # Restore Model 
    sess.run(tf.global_variables_initializer())
    saver = tf.train.Saver()
    saver.restore(sess, "./model_logs/model.ckpt-8750")

    # Create export directory
    export_directory = ""
    model_version = 1
    export_path = os.path.join(
        tf.compat.as_bytes(export_directory),
        tf.compat.as_bytes(str(model_version))
    )

    # Word2Int
    original_assets_directory = "/tmp/original/export/assets"
    original_assets_filename = "word2int"
    original_assets_filepath = _write_assets(original_assets_directory,original_assets_filename, word2int)

    # Set up the assets collection.
    assets_filepath = tf.constant(original_assets_filepath)
    tf.add_to_collection(tf.GraphKeys.ASSET_FILEPATHS, assets_filepath)
    filename_tensor = tf.Variable(
        original_assets_filename,
        name="filename_tensor",
        trainable=False,
        collections=[])
    assign_filename_op = filename_tensor.assign(original_assets_filename)

    # Int2Word
    original_assets_directory = "/tmp/original/export/assets"
    original_assets_filename = "int2word"
    original_assets_filepath = _write_assets(original_assets_directory,original_assets_filename, int2word)

    # Set up the assets collection.
    assets_filepath = tf.constant(original_assets_filepath)
    tf.add_to_collection(tf.GraphKeys.ASSET_FILEPATHS, assets_filepath)
    filename_tensor = tf.Variable(
        original_assets_filename,
        name="filename_tensor",
        trainable=False,
        collections=[])
    assign_filename_op = filename_tensor.assign(original_assets_filename)

    # Builder
    builder = tf.saved_model.builder.SavedModelBuilder(export_path)

    # Create Tensor Info
    tensor_info_x = tf.saved_model.utils.build_tensor_info(input_data)
    tensor_info_y = tf.saved_model.utils.build_tensor_info(inference_logits)
    tensor_info_keep_prob = tf.saved_model.utils.build_tensor_info(keep_prob)
    tensor_info_target_seq_len = tf.saved_model.utils.build_tensor_info(target_sequence_length)
    tensor_info_source_seq_len = tf.saved_model.utils.build_tensor_info(source_sequence_length)

    # Build Prediction Signature
    prediction_signature = (
        tf.saved_model.signature_def_utils.build_signature_def(
          inputs={'input': tensor_info_x,'keep_prob': tensor_info_keep_prob,'target_sq_len': tensor_info_target_seq_len,
                  'source_sq_len': tensor_info_source_seq_len},
          outputs={'predictions': tensor_info_y},
          method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))

    # Save the model
    legacy_init_op = tf.group(tf.tables_initializer(), name='legacy_init_op')
    builder.add_meta_graph_and_variables(
        sess, [tf.saved_model.tag_constants.SERVING],
        signature_def_map={
            'prediction':prediction_signature
        }, 
        assets_collection=tf.get_collection(tf.GraphKeys.ASSET_FILEPATHS),
        legacy_init_op=legacy_init_op)

    builder.save()

This leads to the following folder structure:

assets --- int2word --- word2int
saved_model.pb
variables --- variables.data-00000-of-00001 --- variables.index

What I don't understand is how these assets can / should be used by the tensorflow serving model. They are needed by several functions within the graph but are not feed to it. They are basically like global variables. Do I just need to load them into the client and magic will happen or can I somehow load them into the graph avoiding to export them as assets? This leads to problem nr. 3.

3. C++ custom servable

It seems to me that there is no way around creating a C++ custom servable if someone wants to use assets (https://www.tensorflow.org/serving/custom_servable). If I understand it correctly, I can still use my Python client and just need to change the serving part to use C++.

This is the part that I don't understand at all, and having no C++ experience doesn't make it better. I don't really know where to start and if I don't get it working I'm in big trouble.

In order to use tensorflow serving I created a docker image using the Dockerfile.devel in the repository, then I downloaded the tensorflow serving repo and used:

bazel build -c opt tensorflow_serving/...

I can then run the following command to serve a model:

bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=rnn --model_base_path=rnn-export &> rnn_log &

According to the documentation it looks like a new Loader and Source Adapter should be created. However, I couldn't find any clear examples about how to do this. It is also not clear to me which files I need to change and how to run the code.

Do I have to change the core/loader.h, core/simple_loader.h and main.cc files? Are there any examples? I have no idea where to start.

Are there any other ways in order to deploy the model to production or how can I solve this? I find the whole process very complex for such a simple model architecture. Maybe my approach is wrong and I'm missing something, as I couldn't find much information online. This should be a normal problem that people are exposed to on a daily basis.

Would be great if you have some ideas how to solve this.

iyukuni commented 6 years ago

I'm curious on how these assets are to be loaded when serving? Are there additional ops needed to be defined to load them?

assets --- int2word --- word2int

chrisolston commented 6 years ago

Hi there,

TF-Serving supports loading assets out of the box, without needing to drop down to writing a custom c++ servable. In particular, the SavedModel loader finds and loads the assets. See https://www.tensorflow.org/programmers_guide/saved_model ; and if needed you can look at the implementation of loading assets here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/saved_model/loader.cc.

Regarding your transformation question, you might want to post to a tf.transform forum.

Chris

xiaoxiaoyuwen commented 6 years ago

Hi @chrisolston I have the same problem with @iyukuni. As you mentioned above, TF-serving will load the assets for us, but my question is how and when can I build the word2int and int2word from the loaded files? so that I can do some pre-processing for the input strings and post-processing for the output ids.

jacks808 commented 6 years ago

@xiaoxiaoyuwen did you fix this? face the same issue.

praveeny1986 commented 6 years ago

I also have the same doubt. How does the tensorflow serving accesses the assets? Is there any working example? In doc - https://www.tensorflow.org/programmers_guide/saved_model assets_collection is mentioned but how it has been used in the graph is not clear.

@jacks808 , @xiaoxiaoyuwen , were you able to get to the solution?

aldenhallak commented 6 years ago

I have the same question! There is plenty of documentation on how to include assets in the SavedModel, but how do I access and use these assets in the graph? If my asset file is a vocabulary (a list of words in order of frequency), how do I get a tensor that I can then use? There does not seem to be any examples of this online, or at least any that I can understand.

ruanchong commented 6 years ago

I'm wondering if there is a working example on how to exploit files in assets folder.

charlesverge commented 6 years ago

For using the asserts I'm interested in seeing additional examples as well. @a-a-e I been looking to do this as well, to my understanding is you would use the main_op to initialize the tables and then lookup the value in your model. Then to my understanding it becomes a balance of clients vs tensorflow model processing.

The main_op is used hear https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/saved_model/saved_model_half_plus_two.py

Mentions of asserts are sprinkled around like https://github.com/tensorflow/tensorflow/tree/r1.11/tensorflow/python/saved_model

Support for Assets. For cases where ops depend on external files for initialization, such as vocabularies, SavedModel supports this via assets. Assets are copied to the SavedModel location and can be read when loading a specific meta graph def.

More examples using assets would be great to show good use cases

@gautamvasudevan

guillaumekln commented 6 years ago

For vocabulary assets, I recommend using the tf.contrib.lookup module in your code. It manages assets for you, from registering them to loading them at serving time. See in particular:

Speaking of real examples, we use this in addition to Estimators in OpenNMT-tf to easily support model export and serving.

Preprocessing is an open question though but there is work to make a SentencePiece TensorFlow op which would solve it for simple use cases in machine translation.

nidhikamath91 commented 6 years ago

@a-a-e Did you solve the issue ?

gautamvasudevan commented 6 years ago

I think @guillaumekln has the right idea here. Closing due to lack of activity.

sathyarr commented 5 years ago

Should we manually copy required files into assets folder to successfully export? Ref: export_half_plus_two.py

I've been thinking that based on the tensor's requirement upon training, the required files are automatically copied to assets folder. Is that wrong assumption?

Please have a look here

Also, should we manually specify the required initializations through init_op?

guillaumekln commented 5 years ago

If you are referring to the functions tf.contrib.lookup.index_table_from_file and tf.contrib.lookup.index_to_string_table_from_file, they should do that for you, from saving the vocabulary in the asset folder to loading it automatically at serving time.

sathyarr commented 5 years ago

If you are referring to the functions tf.contrib.lookup.index_table_from_file and tf.contrib.lookup.index_to_string_table_from_file, they should do that for you, from saving the vocabulary in the asset folder to loading it automatically at serving time.

It gives, TypeError: index_table_from_file() got an unexpected keyword argument 'key_column_index' as referred here though key_column_index is acceptable as per official doc. Could not get that.

However, I managed to do with tf.contrib.lookup.TextFileStringTableInitializer and tf.contrib.lookup.HashTable. It automatically copies when everything is referred as Tensor and assets_collection=tf.get_collection(tf.GraphKeys.ASSET_FILEPATHS) is passed as parameter to Builder. Thank you for your help. Also, this was helpful too.

I'm now a bit stuck on how to go about the I/O and pre-processing of seq2seq. The default SignatureDef shows the inputs are,

inputs['source_ids'] tensor_info:
    dtype: DT_INT64
    shape: (-1, -1)
    name: model/att_seq2seq/hash_table_1_Lookup:0
inputs['source_len'] tensor_info:
    dtype: DT_INT32
    shape: (-1)
    name: model/att_seq2seq/Minimum:0
inputs['source_tokens'] tensor_info:
    dtype: DT_STRING
    shape: (-1, -1)
    name: model/att_seq2seq/strided_slice:0

But, as of now, I'm giving only inputs['source_tokens']. The inputs['source_len'] and inputs['source_ids'] are generated upon preprocessing in python.

How do I go about this?

Need to give only inputs['source_tokens'] as input
Need to transform them to inputs['source_tokens'], inputs['source_len'] and inputs['source_ids']
Should the SignatureDef needs to be modified?

The solution should work for both training and infer.

Also, how did the default SignatureDef comes like that(as mentioned above)? Are they based on the input to encoder?

Thanks in advance

sathyarr commented 5 years ago

Using the existing model in TF Serving without any lookup table and with default SignatureDef, POST requested with following body(Assuming no lookup tables needed if process starts directly at encode() and hence the respective SignatureDef),

    "inputs": {
        "source_tokens": "['AND','COME','SO']",
        "source_ids": [3, 7, 5],
        "source_len": [3]
    }

source_ids are manually found and added. source_len is calculated. However, it results in,

{
    "error": "len(seq_lens) != input.dims(0), (1 vs. 3)\n\t [[{{node model/att_seq2seq/encode/bidi_rnn_encoder/bidirectional_rnn/bw/ReverseSequence}}]]"
}

In (1 vs. 3) above, term in place of 3 changes as per length of source_ids

Corresponding TF Serving console, 2019-03-05 13:27:41.699119: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at transpose_op.cc:157 : Invalid argument: transpose expects a vector of size 2. But input(1) is a vector of size 3

Any help is apprectiated.

tensorflow / serving

TFServing Issues with Seq2Seq (Preprocessing, Assets, C++) #770