huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.73k stars 27.17k forks source link

Error In Fine-Tuning Transformer XL ValueError: The two structures don't have the same sequence length. Input structure has length 3, while shallow structure has length 2. #11560

Closed rajgar114 closed 3 years ago

rajgar114 commented 3 years ago

Environment info

Who can help

@patrickvonplaten

Information

I am performing a type of machine translation task in which I have to translate English Sentences to Hinglish Sentences. I am trying to use the pre-trained Transformer-XL Model by fine tuning it on my custom dataset. Here is my code:

import pandas as pd
import tensorflow as tf
from transformers import TransfoXLTokenizer
from transformers import TFTransfoXLModel
import numpy as np
from sklearn.model_selection import train_test_split

#Loading data
dataFrame = pd.read_csv("data.csv")
dataFrame.head(3)

#-----Output 1-----

#Splitting Dataset
X = dataFrame['English']
Y = dataFrame['Hinglish']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 42)

#Tokenization
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')

tokenizer.pad_token = tokenizer.eos_token

XTrainEncodings = tokenizer(X_train.to_list(),  max_length = 150, padding = True)
XTestEncodings = tokenizer(X_test.to_list(), max_length = 150, padding = True)
YTrainEncodings = tokenizer(Y_train.to_list(), max_length = 150, padding = True)
YTestEncodings = tokenizer(Y_test.to_list(), max_length = 150, padding = True)
print("XTrainEncodings : ", XTrainEncodings)
print("YTrainEncodings : ", YTrainEncodings)

#-----Output 2-----

#Converting to Tensors
X_train = tf.data.Dataset.from_tensor_slices((dict(XTrainEncodings), (dict(YTrainEncodings))))
X_test = tf.data.Dataset.from_tensor_slices((dict(XTestEncodings), (dict(YTestEncodings))))
print(X_train)

#-----Output 3-----

#Fine Tuning
model = TFTransfoXLModel.from_pretrained('transfo-xl-wt103')

optimizer = tf.keras.optimizers.Adam(learning_rate = 5e-5)
model.compile(optimizer = optimizer, loss = tf.losses.SparseCategoricalCrossentropy(), metrics = ['accuracy'])

history = model.fit(X_train.batch(1), epochs = 2, batch_size = 1, validation_data = X_test.batch(1))

Outputs

-----Output 1-----

    English                     Hinglish
How are you ?                Tum kaise ho ?
I am fine.                  Main theek hoon
......

-----Output 2-----
XTrainEncodings :  {'input_ids': [[4241, 0, 0, 0, 0, 0], [4827, 37, 304, 788, 0, 0],....
YTrainEncodings :  {'input_ids': [[13762, 0, 0, 0, 0], [71271, 24, 33289, 788, 0],....

-----Output 3-----
<TensorSliceDataset shapes: ({input_ids: (6,)}, {input_ids: (5,)}), types: ({input_ids: tf.int32}, {input_ids: tf.int32})>

Error

ValueError: in user code:

    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:805 train_function  *
        return step_function(self, iterator)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:795 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica
        return fn(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:788 run_step  **
        outputs = model.train_step(data)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:758 train_step
        self.compiled_metrics.update_state(y, y_pred, sample_weight)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/compile_utils.py:387 update_state
        self.build(y_pred, y_true)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/compile_utils.py:318 build
        self._metrics, y_true, y_pred)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py:1163 map_structure_up_to
        **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py:1245 map_structure_with_tuple_paths_up_to
        expand_composites=expand_composites)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py:878 assert_shallow_structure
        input_length=len(input_tree), shallow_length=len(shallow_tree)))

    ValueError: The two structures don't have the same sequence length. Input structure has length 3, while shallow structure has length 2.

Please help me in detecting the reason and solving the error. Also I want to know whether I am following a correct way to achieve my task or I am missing something. Thanks

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten commented 3 years ago

Hey @rajgar114,

Sorry to answer so late! It's quite difficult to reproduce an error that occurs in a full training loop. Could you by chance provide a minimal reproducible code snippet? Ideally one that doesn't require any dataset but just a tensorflow dummy tensor?

rajgar114 commented 3 years ago

Sorry, I didn't get that. I have pasted the complete code. You can try to reproduce that even on 2-3 line of dataset as given in Output:-

    English                     Hinglish
How are you ?                Tum kaise ho ?
I am fine.                  Main theek hoon

This can be easily converted to tensors as I have used above. This can act as dummy tensors after tokenization.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.