tensorflow / transform

Input pipeline framework
Apache License 2.0
986 stars 215 forks source link

TF 2.x Incompatibilities: Graph Error using TF Hub, Keras #164

Closed luischinchillagarcia closed 4 years ago

luischinchillagarcia commented 4 years ago

In the task of transforming text to vectors (as in this official GCP example), the idea is to pass text into a TF Hub layer. However, this example causes an error in two ways. Here is a Colab Notebook reproducing these issues described below.

First: TF Hub Incompatibilities ONLY the old TF Hub models that utilized TF1 succeed (eg. the old universal sentence encoder of v2). Any new TF 2-compatible model shows Tensor shape errors, or graph errors, or errors requiring placeholder tensors.

Second: TF2.x Incompatibilities Even if using the old tf.hub compatible model, there an error arises when trying to use any kind of split function (eg. tf.string.split or tf.compat.v1.string_split), where it creates a new entries to the output dictionary (or rows) for each element it creates instead or having a list of lists or a 2D tensor. This is a huge problem because if wanted to split the words, turn them into vectors, and concatenate those vectors in some way, we cannot, because the split function just creates new tensors that get added to the output dictionary without being able to be further applied to any other tf-function.

Both of these issues can be traced to TF2.x incompatibilities. So this brings up the question, is the future of TFT to evolve to become more compatible with TF2.x? And if so, are there any plans for a user to be able to use it in such as way that we can put arbitrary tf functions in the preprocessing step without needing to worry about TF1 sessions, graphs, and functions?

# Issue 1: TF Hub Incompatibilities

UNI_SENT_ENC_OLD = 'https://tfhub.dev/google/universal-sentence-encoder/2'
UNI_SENT_ENC_NEW = 'https://tfhub.dev/google/universal-sentence-encoder/4'
UNI_SENT_ENC_NEW = 'https://tfhub.dev/google/universal-sentence-encoder-multilingual/3'

def _get_embeddings_1(input_string, module):
    embed = module(input_string)
    return embed

def _get_embeddings_2(input_string):
    hub_layer = hub.KerasLayer("https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1", output_shape=[20],
                            input_shape=[], dtype=tf.string)

    model = tf.keras.Sequential()
    model.add(hub_layer)

    output = model.predict([input_string], steps=1)

    # input_as_tensor = tf.constant([input_string], tf.string)
    # output = model(input_as_tensor) # also Fails

    return output

def preprocessing_fn(input_features):
    """Preprocess input columns into transformed columns."""

    # Get column 
    sent = input_features['sample_sentence']

    # ---------
    # Apply tranformation to column(s)
    # ---------

    # ---------
    # Succeeds, but is old method incompatible with new tf.hub embeddings
    # ---------
    module_old = hub.Module(UNI_SENT_ENC_OLD)
    sent_embeds_old_1 = _get_embeddings_1(sent, module=module_old)

    # ---------
    # Fails: Graph error
    # ---------
    module_new = hub.load(UNI_SENT_ENC_NEW)
    sent_embeds_old_2 = _get_embeddings_1(sent, module=module_new) #Fails

    # ---------
    # Fails: Tensor shape error
    # ---------
    module_multi_univ_2 = hub.load(UNI_SENT_ENC_NEW)
    multi_sent_enc_new_2 = _get_embeddings_1(sent, module=module_multi_univ_2)

    # ---------
    # Fails: Requires Placeholder Tensor
    # ---------
    multi_sent_enc_new_3 = _get_embeddings_2(sent)

    return {
        'sent_embeds_old_1': sent_embeds_old_1,

        # 'sent_embeds_old_2': sent_embeds_old_2, # Fails

        # 'multi_sent_enc_new_2': multi_sent_enc_new_2 #Fails

        # 'multi_sent_enc_new_3': multi_sent_enc_new_3 # Fails: Requires Placeholder Tensor

    }
# Issue 2: TF2.x Incompatibilities

UNI_SENT_ENC_OLD = 'https://tfhub.dev/google/universal-sentence-encoder/2'

def _get_embeddings_1(content, module, delimiter='\n'):
    def _map_fn(t):
        t = tf.cast(t, tf.string)
        t = tf.strings.split([t], delimiter).values
        e = module(t)
        e = tf.reduce_mean(e, axis=0)
        return tf.squeeze(e)

    embed = tf.map_fn(_map_fn, content, dtype=tf.float32)
    return embed

def preprocessing_fn(input_features):
    """Preprocess input columns into transformed columns."""

    # Get column 
    sent = input_features['sample_sentence']

    # Apply tranformation (Strings -> Embeddings)
    sent_embeds = _get_embeddings_1(sent, module=module_old, delimiter='\n')

    return {
        'sent_embeds': sent_embeds,

    }
zoyahav commented 4 years ago

Could you please include error logs as well?

rmothukuru commented 4 years ago

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!