Suddenly unable to create BERT encodings

melissafeeney commented 6 months ago

System Info

Google colab- Python 3.10.12 transformers 4.38.2 tensorflow 2.15.0 datasets 2.18.0

Who can help?

@gante and @Rocketknight1

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

I am trying to fine-tune TFBert for an NLI task, with the addition of additional dense layers on top of the BERT encodings. My dataset is a dataframe that contains a column with premises, a column with hypotheses, and a column for the label- either entailment or contradiction.

In the past I have used similar code with TFBert to encode my dataset and fine-tune an NLI model without issue. The line that raises the issue is: embeddings = bert.layers[0](input_ids, attention_mask, token_type_ids), and I cannot figure out why- since I stated I have used this line in similar code (the differences in my previous code included converting the dataset to Dataset format after tokenizing, not before) without any problems before. The code was working last week- and it seems that there was a transformers release 4.38.2 ~5 days ago, could this issue be related to that new release?

Here is my complete code:

##### 1. Load tokenizer
from transformers import BertTokenizer
tranformersPreTrainedModelName = 'bert-base-uncased'
bert_tokenizer = BertTokenizer.from_pretrained(tranformersPreTrainedModelName)

#### 2. Load model
from transformers import TFBertModel
bert = TFBertModel.from_pretrained(tranformersPreTrainedModelName, output_hidden_states = True)

data = pd.read_excel('/content/nli_fine_tuning.xlsx', sheet_name = 'fine tuning 4')

# Split into train, validate and test datasets
train, val, test = np.split(data.sample(frac = 1, random_state = 123), [int(.6*len(data)), int(.8*len(data))])

# Clean up datasets, convert to Dataset format
train_dataset = Dataset.from_pandas(train)
train_dataset = train_dataset.remove_columns(["__index_level_0__"])
val_dataset = Dataset.from_pandas(val)
val_dataset = val_dataset.remove_columns(["__index_level_0__"])
test_dataset = Dataset.from_pandas(test)
test_dataset = test_dataset.remove_columns(["__index_level_0__"])

# Tokenize the datasets
def tokenize_data(data, tokenizer):
  encoded_data = bert_tokenizer(data['premise'], data['hypothesis'], 
                           max_length = 100, 
                           truncation = True, 
                           padding = 'max_length', 
                           add_special_tokens = True, 
                           return_token_type_ids = True, 
                           return_attention_mask = True, 
                           return_tensors = 'tf')

  labels = np.array(pd.get_dummies(data['label']))
  return encoded_data, labels

# Apply tokenizer to train, val and test
train_encoded_data, train_labels = tokenize_data(train_dataset, bert_tokenizer)
val_encoded_data, val_labels = tokenize_data(val_dataset, bert_tokenizer)
test_encoded_data, test_labels = tokenize_data(test_dataset, bert_tokenizer)

# Model setup
input_ids = tf.keras.Input(shape = (100,),dtype=tf.int32, name = 'input_ids')
attention_mask = tf.keras.Input(shape = (100,),dtype=tf.int32, name = 'attention_mask')
token_type_ids = tf.keras.Input(shape = (100,),dtype=tf.int32, name = 'token_type_ids')

# create BERT embeddings
embeddings = bert.layers[0](input_ids, attention_mask, token_type_ids)
last_hidden_states = embeddings.last_hidden_state # extract the last hidden state
X = tf.keras.layers.Dense(32, activation = 'relu')(last_hidden_states) # Dense layers for classification
y = tf.keras.layers.Dense(2, activation = 'softmax')(X)

finetuned_bert_model = tf.keras.Model(inputs = [input_ids, attention_mask, token_type_ids], outputs = y)

# Freeze Bert layer
finetuned_bert_model.layers[3].trainable = False

# Compile
finetuned_bert_model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

bert_hist = finetuned_bert_model.fit(train_encoded_data, train_labels,
                           validation_data = [val_encoded_data, val_labels], 
                           epochs = 5)

And here is the printout of the error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-17-75f3b98408a6>](https://localhost:8080/#) in <cell line: 6>()
      4 
      5 # form BERT embeddings
----> 6 embeddings = bert.layers[0](input_ids, attention_mask, token_type_ids)
      7 # then extract the last hidden state
      8 last_hidden_states = embeddings.last_hidden_state

6 frames
[/usr/local/lib/python3.10/dist-packages/tf_keras/src/utils/traceback_utils.py](https://localhost:8080/#) in error_handler(*args, **kwargs)
     68             # To get the full stack trace, call:
     69             # `tf.debugging.disable_traceback_filtering()`
---> 70             raise e.with_traceback(filtered_tb) from None
     71         finally:
     72             del filtered_tb

[/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_utils.py](https://localhost:8080/#) in run_call_with_unpacked_inputs(self, *args, **kwargs)
    426 
    427         unpacked_inputs = input_processing(func, config, **fn_args_and_kwargs)
--> 428         return func(self, **unpacked_inputs)
    429 
    430     # Keras enforces the first layer argument to be passed, and checks it through `inspect.getfullargspec()`. This

[/usr/local/lib/python3.10/dist-packages/transformers/models/bert/modeling_tf_bert.py](https://localhost:8080/#) in call(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict, training)
    910             token_type_ids = tf.fill(dims=input_shape, value=0)
    911 
--> 912         embedding_output = self.embeddings(
    913             input_ids=input_ids,
    914             position_ids=position_ids,

[/usr/local/lib/python3.10/dist-packages/transformers/models/bert/modeling_tf_bert.py](https://localhost:8080/#) in call(self, input_ids, position_ids, token_type_ids, inputs_embeds, past_key_values_length, training)
    204 
    205         if input_ids is not None:
--> 206             check_embeddings_within_bounds(input_ids, self.config.vocab_size)
    207             inputs_embeds = tf.gather(params=self.weight, indices=input_ids)
    208 

[/usr/local/lib/python3.10/dist-packages/transformers/tf_utils.py](https://localhost:8080/#) in check_embeddings_within_bounds(tensor, embed_dim, tensor_name)
    161         tensor_name (`str`, *optional*): The name of the tensor to use in the error message.
    162     """
--> 163     tf.debugging.assert_less(
    164         tensor,
    165         tf.cast(embed_dim, dtype=tensor.dtype),

[/usr/local/lib/python3.10/dist-packages/keras/src/layers/core/tf_op_layer.py](https://localhost:8080/#) in handle(self, op, args, kwargs)
    117             for x in tf.nest.flatten([args, kwargs])
    118         ):
--> 119             return TFOpLambda(op)(*args, **kwargs)
    120         else:
    121             return self.NOT_SUPPORTED

[/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py](https://localhost:8080/#) in error_handler(*args, **kwargs)
     68             # To get the full stack trace, call:
     69             # `tf.debugging.disable_traceback_filtering()`
---> 70             raise e.with_traceback(filtered_tb) from None
     71         finally:
     72             del filtered_tb

TypeError: Exception encountered when calling layer 'embeddings' (type TFBertEmbeddings).

Could not build a TypeSpec for name: "tf.debugging.assert_less/assert_less/Assert/Assert"
op: "Assert"
input: "tf.debugging.assert_less/assert_less/All"
input: "tf.debugging.assert_less/assert_less/Assert/Assert/data_0"
input: "tf.debugging.assert_less/assert_less/Assert/Assert/data_1"
input: "tf.debugging.assert_less/assert_less/Assert/Assert/data_2"
input: "Placeholder"
input: "tf.debugging.assert_less/assert_less/Assert/Assert/data_4"
input: "tf.debugging.assert_less/assert_less/y"
attr {
  key: "T"
  value {
    list {
      type: DT_STRING
      type: DT_STRING
      type: DT_STRING
      type: DT_INT32
      type: DT_STRING
      type: DT_INT32
    }
  }
}
attr {
  key: "summarize"
  value {
    i: 3
  }
}
 of unsupported type <class 'tensorflow.python.framework.ops.Operation'>.

Call arguments received by layer 'embeddings' (type TFBertEmbeddings):
  • input_ids=<KerasTensor: shape=(None, 100) dtype=int32 (created by layer 'input_ids')>
  • position_ids=None
  • token_type_ids=<KerasTensor: shape=(None, 100) dtype=int32 (created by layer 'token_type_ids')>
  • inputs_embeds=None
  • past_key_values_length=0
  • training=False

Expected behavior

I would expect bert.layers[0](input_ids, attention_mask, token_type_ids) to create the encodings to train the model, after which I could add other types of layers to accomplish my NLI task

gante commented 6 months ago

Hi @melissafeeney 👋

The script you shared requires a local file ('/content/nli_fine_tuning.xlsx'), and has a few assumptions about the data. Would you be able to share a short stand-alone script? 🤗

melissafeeney commented 6 months ago

Hi @gante, sure! Please see below, using an excerpt from my dataset:

premises = pd.DataFrame(['These jeans are made from a durable, dark wash denim and have a relaxed fit through the leg. They feature a classic five-pocket design and a straight leg silhouette.',
       'These jeans are made from a durable, dark wash denim and have a relaxed fit through the leg. They feature a classic five-pocket design and a straight leg silhouette.',
       'This t-shirt is made from a soft, breathable cotton fabric and features a crew neck and short sleeves. It has a graphic print on the front and a relaxed fit.',
       'This t-shirt is made from a soft, breathable cotton fabric and features a crew neck and short sleeves. It has a graphic print on the front and a relaxed fit.',
       'These boxer briefs are made from a moisture-wicking fabric and have a comfortable, contoured pouch. They feature a tag-free waistband and a seamless construction.',
       'These boxer briefs are made from a moisture-wicking fabric and have a comfortable, contoured pouch. They feature a tag-free waistband and a seamless construction.',
       'This dress is made from a flowy, floral print fabric and has a flattering A-line silhouette. It features a v-neckline and adjustable straps.',
       'This dress is made from a flowy, floral print fabric and has a flattering A-line silhouette. It features a v-neckline and adjustable straps.',
       'These leggings are made from a high-waisted, stretchy fabric and have a slimming fit. They feature a moisture-wicking material and a seamless design.',
       'These leggings are made from a high-waisted, stretchy fabric and have a slimming fit. They feature a moisture-wicking material and a seamless design.',
       'This blouse is made from a lightweight, silk blend fabric and features a button-up front and a relaxed fit. It has delicate lace detailing on the collar and cuffs.',
       'This blouse is made from a lightweight, silk blend fabric and features a button-up front and a relaxed fit. It has delicate lace detailing on the collar and cuffs.',
       'These jeans are made from a high-quality, sustainable denim fabric and feature a skinny fit. They have a high-waisted design and a classic five-pocket style.',
       'These jeans are made from a high-quality, sustainable denim fabric and feature a skinny fit. They have a high-waisted design and a classic five-pocket style.',
       'This activewear set is made from a sweat-wicking, stretchy fabric and features a sports bra and leggings. The bra has a supportive design and adjustable straps, while the leggings have a high-waisted fit and a seamless construction.',
       'This activewear set is made from a sweat-wicking, stretchy fabric and features a sports bra and leggings. The bra has a supportive design and adjustable straps, while the leggings have a high-waisted fit and a seamless construction.',
       'This nightgown is made from a soft, lightweight cotton fabric and features a relaxed fit and a V-neckline. It has a delicate floral print and falls below the knee.',
       'This nightgown is made from a soft, lightweight cotton fabric and features a relaxed fit and a V-neckline. It has a delicate floral print and falls below the knee.',
       'This swimsuit is made from a chlorine-resistant, quick-drying fabric and features a flattering one-piece design with adjustable straps. It has a built-in bra for support and a moderate leg cut.',
       'This swimsuit is made from a chlorine-resistant, quick-drying fabric and features a flattering one-piece design with adjustable straps. It has a built-in bra for support and a moderate leg cut.'])

hypotheses = pd.DataFrame(['These jeans will provide a comfortable fit with some room to move.',
       'The dark wash denim makes these jeans suitable for more casual occasions.',
       'The soft cotton fabric makes this t-shirt comfortable for everyday wear.',
       'The graphic print adds a touch of personality to this t-shirt.',
       'The moisture-wicking fabric helps keep you cool and dry throughout the day.',
       'The seamless construction prevents chafing and irritation.',
       'The flowy fabric will drape nicely and flatter most body types.',
       'The floral print adds a feminine touch to this dress.',
       'The high-waisted design provides a comfortable and secure fit.',
       'The stretchy fabric allows for a wide range of motion.',
       'This blouse is made from a rough and scratchy fabric that is uncomfortable to wear.',
       'The lace detailing on this blouse is itchy and irritating to the skin.',
       'These jeans are not made from sustainable materials and contribute to environmental harm.',
       'These jeans are not true to size and run much smaller than advertised.',
       'This set is not made from breathable fabric and traps heat during exercise.',
       'The leggings in this set are see-through and not suitable for wearing in public.',
       'This nightgown is made from a thick and warm fabric that is not suitable for sleeping in hot weather.',
       'This nightgown is shorter than expected and does not provide adequate coverage.',
       'The chlorine-resistant fabric fades quickly and loses its color after a few uses.',
       'The adjustable straps on this swimsuit are difficult to adjust and do not stay in place.'])

nli_labels = pd.DataFrame(['entailment', 'entailment', 'entailment', 'entailment',
       'entailment', 'entailment', 'entailment', 'entailment',
       'entailment', 'entailment', 'contradiction', 'contradiction',
       'contradiction', 'contradiction', 'contradiction', 'contradiction',
       'contradiction', 'contradiction', 'contradiction', 'contradiction'])

data = pd.concat([premises, hypotheses, nli_labels], axis = 1)
data.columns = ['premise', 'hypothesis', 'label']

# Split into train, validate and test
train, val, test = np.split(data.sample(frac = 1, random_state = 123), [int(.6*len(data)), int(.8*len(data))])

# Clean up datasets, convert to Dataset format
train_dataset = Dataset.from_pandas(train)
train_dataset = train_dataset.remove_columns(["__index_level_0__"])

val_dataset = Dataset.from_pandas(val)
val_dataset = val_dataset.remove_columns(["__index_level_0__"])

test_dataset = Dataset.from_pandas(test)
test_dataset = test_dataset.remove_columns(["__index_level_0__"])

# Tokenize the datasets
def tokenize_data(data, tokenizer):
  encoded_data = bert_tokenizer(data['premise'], data['hypothesis'],
                           max_length = 100,
                           truncation = True,
                           padding = 'max_length',
                           add_special_tokens = True,
                           return_token_type_ids = True,
                           return_attention_mask = True,
                           return_tensors = 'tf')

  labels = np.array(pd.get_dummies(data['label']))
  return encoded_data, labels

# Apply to all segments
train_encoded_data, train_labels = tokenize_data(train_dataset, bert_tokenizer)
val_encoded_data, val_labels = tokenize_data(val_dataset, bert_tokenizer)
test_encoded_data, test_labels = tokenize_data(test_dataset, bert_tokenizer)

input_ids = tf.keras.Input(shape = (100,),dtype=tf.int32, name = 'input_ids')
attention_mask = tf.keras.Input(shape = (100,),dtype=tf.int32, name = 'attention_mask')
token_type_ids = tf.keras.Input(shape = (100,),dtype=tf.int32, name = 'token_type_ids')

# form BERT embeddings
embeddings = bert.layers[0](input_ids, attention_mask, token_type_ids)
# then extract the last hidden state
last_hidden_states = embeddings.last_hidden_state
# Dense layers for classification
X = tf.keras.layers.Dense(32, activation = 'relu')(last_hidden_states)
y = tf.keras.layers.Dense(2, activation = 'softmax')(X)

finetuned_bert_model = tf.keras.Model(inputs = [input_ids, attention_mask, token_type_ids], outputs = y)

# Freeze Bert layer
finetuned_bert_model.layers[3].trainable = False

# Compile
finetuned_bert_model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

bert_hist = finetuned_bert_model.fit(train_encoded_data, train_labels,
                           validation_data = [val_encoded_data, val_labels],
                           epochs = 5)

ShubhamModi77 commented 6 months ago

I am facing same error.

Rocketknight1 commented 6 months ago

This may be related to compatibility changes we made for Keras 3! @gante let me know if you need me to handle it instead. @ShubhamModi77 and @melissafeeney can you also let us know what version of Keras you have installed? (import keras; keras.__version__)

melissafeeney commented 6 months ago

@Rocketknight1 I am using Keras 2.15.0

ShubhamModi77 commented 6 months ago

I am also using Keras version = 2.15.0

gante commented 6 months ago

@Rocketknight1 if you suspect that it is Keras 3-related, then yes please have a look (I am out of the loop wrt the changes :D)

Rocketknight1 commented 6 months ago

It was just a guess! Also @melissafeeney, sorry to annoy you again, but the second script is incomplete - can you merge it all together into one full script I can run without needing any of your data files or any extra inputs/arguments? It'll help me reproduce the error more quickly

melissafeeney commented 6 months ago

All good, I really appreciate your help! @Rocketknight1 this script should not require any local files- should run ok as is. The line that breaks is the creation of the BERT layer, embeddings = bert.layers[0](input_ids, attention_mask, token_type_ids)


import pandas as pd
import numpy as np
import tensorflow as tf
from datasets import Dataset

from transformers import BertTokenizer
tranformersPreTrainedModelName = 'bert-base-uncased'
bert_tokenizer = BertTokenizer.from_pretrained(tranformersPreTrainedModelName)

from transformers import TFBertModel
bert = TFBertModel.from_pretrained(tranformersPreTrainedModelName, output_hidden_states = True)

premises = pd.DataFrame(['These jeans are made from a durable, dark wash denim and have a relaxed fit through the leg. They feature a classic five-pocket design and a straight leg silhouette.',
       'These jeans are made from a durable, dark wash denim and have a relaxed fit through the leg. They feature a classic five-pocket design and a straight leg silhouette.',
       'This t-shirt is made from a soft, breathable cotton fabric and features a crew neck and short sleeves. It has a graphic print on the front and a relaxed fit.',
       'This t-shirt is made from a soft, breathable cotton fabric and features a crew neck and short sleeves. It has a graphic print on the front and a relaxed fit.',
       'These boxer briefs are made from a moisture-wicking fabric and have a comfortable, contoured pouch. They feature a tag-free waistband and a seamless construction.',
       'These boxer briefs are made from a moisture-wicking fabric and have a comfortable, contoured pouch. They feature a tag-free waistband and a seamless construction.',
       'This dress is made from a flowy, floral print fabric and has a flattering A-line silhouette. It features a v-neckline and adjustable straps.',
       'This dress is made from a flowy, floral print fabric and has a flattering A-line silhouette. It features a v-neckline and adjustable straps.',
       'These leggings are made from a high-waisted, stretchy fabric and have a slimming fit. They feature a moisture-wicking material and a seamless design.',
       'These leggings are made from a high-waisted, stretchy fabric and have a slimming fit. They feature a moisture-wicking material and a seamless design.',
       'This blouse is made from a lightweight, silk blend fabric and features a button-up front and a relaxed fit. It has delicate lace detailing on the collar and cuffs.',
       'This blouse is made from a lightweight, silk blend fabric and features a button-up front and a relaxed fit. It has delicate lace detailing on the collar and cuffs.',
       'These jeans are made from a high-quality, sustainable denim fabric and feature a skinny fit. They have a high-waisted design and a classic five-pocket style.',
       'These jeans are made from a high-quality, sustainable denim fabric and feature a skinny fit. They have a high-waisted design and a classic five-pocket style.',
       'This activewear set is made from a sweat-wicking, stretchy fabric and features a sports bra and leggings. The bra has a supportive design and adjustable straps, while the leggings have a high-waisted fit and a seamless construction.',
       'This activewear set is made from a sweat-wicking, stretchy fabric and features a sports bra and leggings. The bra has a supportive design and adjustable straps, while the leggings have a high-waisted fit and a seamless construction.',
       'This nightgown is made from a soft, lightweight cotton fabric and features a relaxed fit and a V-neckline. It has a delicate floral print and falls below the knee.',
       'This nightgown is made from a soft, lightweight cotton fabric and features a relaxed fit and a V-neckline. It has a delicate floral print and falls below the knee.',
       'This swimsuit is made from a chlorine-resistant, quick-drying fabric and features a flattering one-piece design with adjustable straps. It has a built-in bra for support and a moderate leg cut.',
       'This swimsuit is made from a chlorine-resistant, quick-drying fabric and features a flattering one-piece design with adjustable straps. It has a built-in bra for support and a moderate leg cut.'])

hypotheses = pd.DataFrame(['These jeans will provide a comfortable fit with some room to move.',
       'The dark wash denim makes these jeans suitable for more casual occasions.',
       'The soft cotton fabric makes this t-shirt comfortable for everyday wear.',
       'The graphic print adds a touch of personality to this t-shirt.',
       'The moisture-wicking fabric helps keep you cool and dry throughout the day.',
       'The seamless construction prevents chafing and irritation.',
       'The flowy fabric will drape nicely and flatter most body types.',
       'The floral print adds a feminine touch to this dress.',
       'The high-waisted design provides a comfortable and secure fit.',
       'The stretchy fabric allows for a wide range of motion.',
       'This blouse is made from a rough and scratchy fabric that is uncomfortable to wear.',
       'The lace detailing on this blouse is itchy and irritating to the skin.',
       'These jeans are not made from sustainable materials and contribute to environmental harm.',
       'These jeans are not true to size and run much smaller than advertised.',
       'This set is not made from breathable fabric and traps heat during exercise.',
       'The leggings in this set are see-through and not suitable for wearing in public.',
       'This nightgown is made from a thick and warm fabric that is not suitable for sleeping in hot weather.',
       'This nightgown is shorter than expected and does not provide adequate coverage.',
       'The chlorine-resistant fabric fades quickly and loses its color after a few uses.',
       'The adjustable straps on this swimsuit are difficult to adjust and do not stay in place.'])

nli_labels = pd.DataFrame(['entailment', 'entailment', 'entailment', 'entailment',
       'entailment', 'entailment', 'entailment', 'entailment',
       'entailment', 'entailment', 'contradiction', 'contradiction',
       'contradiction', 'contradiction', 'contradiction', 'contradiction',
       'contradiction', 'contradiction', 'contradiction', 'contradiction'])

data = pd.concat([premises, hypotheses, nli_labels], axis = 1)
data.columns = ['premise', 'hypothesis', 'label']

train, val, test = np.split(data.sample(frac = 1, random_state = 123), [int(.6*len(data)), int(.8*len(data))])

train_dataset = Dataset.from_pandas(train)
train_dataset = train_dataset.remove_columns(["__index_level_0__"])

val_dataset = Dataset.from_pandas(val)
val_dataset = val_dataset.remove_columns(["__index_level_0__"])

test_dataset = Dataset.from_pandas(test)
test_dataset = test_dataset.remove_columns(["__index_level_0__"])

def tokenize_data(data, tokenizer):
  encoded_data = bert_tokenizer(data['premise'], data['hypothesis'],
                           max_length = 100,
                           truncation = True,
                           padding = 'max_length',
                           add_special_tokens = True,
                           return_token_type_ids = True,
                           return_attention_mask = True,
                           return_tensors = 'tf')

  labels = np.array(pd.get_dummies(data['label']))
  return encoded_data, labels

train_encoded_data, train_labels = tokenize_data(train_dataset, bert_tokenizer)
val_encoded_data, val_labels = tokenize_data(val_dataset, bert_tokenizer)
test_encoded_data, test_labels = tokenize_data(test_dataset, bert_tokenizer)

input_ids = tf.keras.Input(shape = (100,),dtype=tf.int32, name = 'input_ids')
attention_mask = tf.keras.Input(shape = (100,),dtype=tf.int32, name = 'attention_mask')
token_type_ids = tf.keras.Input(shape = (100,),dtype=tf.int32, name = 'token_type_ids')

embeddings = bert.layers[0](input_ids, attention_mask, token_type_ids)
last_hidden_states = embeddings.last_hidden_state
X = tf.keras.layers.Dense(32, activation = 'relu')(last_hidden_states)
y = tf.keras.layers.Dense(2, activation = 'softmax')(X)

finetuned_bert_model = tf.keras.Model(inputs = [input_ids, attention_mask, token_type_ids], outputs = y)
finetuned_bert_model.layers[3].trainable = False
finetuned_bert_model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

bert_hist = finetuned_bert_model.fit(train_encoded_data, train_labels,
                           validation_data = [val_encoded_data, val_labels],
                           epochs = 5)

Rocketknight1 commented 6 months ago

I think I know what's happening here - the problem is indeed caused by measures we took for Keras 3 compatibility. To confirm, can you pip uninstall tf-keras and let me know if the issue is resolved? If that temporary fix works, I'll try to figure out a more permanent solution.

melissafeeney commented 6 months ago

@Rocketknight1 unfortunately that did not appear to solve the issue

Rocketknight1 commented 6 months ago

That's strange - I can reproduce your issue with tf-keras installed, but not without it! One other test - can you try pip install transformers==4.37.2? That was the last version before the Keras 3 compatibility fix.

melissafeeney commented 6 months ago

That worked! Once that worked I realized my code had other unrelated issues (ha!) related to data formatting, which I've since fixed. Is there anything else I need to do besides make sure that I am using transformers v4.37.2- will this incompatibility be solved in the future?

ShubhamModi77 commented 6 months ago

@Rocketknight1 and @gante Thank you for helping out. It works with transformers 4.37.2

Rocketknight1 commented 6 months ago

Hi @melissafeeney @ShubhamModi77 - yes, this does confirm the issue is caused by our Keras 3 compatibility fixes. I'll see if I can figure out a solution so that you don't have to stay on an older version.

melissafeeney commented 6 months ago

@Rocketknight1 that sounds great, thank you for your help here!

Rocketknight1 commented 6 months ago

Hi @melissafeeney @ShubhamModi77 we have a preliminary fix now at #29598. Can you please try it out and let me know if it helps? You can install the code from the fixed branch with pip install git+https://github.com/huggingface/transformers.git@keras_3_compat_fix

Rocketknight1 commented 6 months ago

Hi @melissafeeney @ShubhamModi77 the fix has now been merged, so you can now use it just by installing from main with pip install git+https://github.com/huggingface/transformers.git. If you encounter any further issues, please feel free to comment here and reopen the issue!

melissafeeney commented 6 months ago

Hmm now it doesn't work with this version of transformers, but I get a different error. For what it's worth, I also upgraded to Keras 3 to see if it would help. I am using transformers 4.38.2, Keras 3.0.5, and tensorflow 2.15.0.

from transformers import BertTokenizer
tranformersPreTrainedModelName = 'bert-base-uncased'
bert_tokenizer = BertTokenizer.from_pretrained(tranformersPreTrainedModelName)

from transformers import TFBertModel
bert = TFBertModel.from_pretrained(tranformersPreTrainedModelName, output_hidden_states = True)

premises = pd.DataFrame(['These jeans are made from a durable, dark wash denim and have a relaxed fit through the leg. They feature a classic five-pocket design and a straight leg silhouette.',
       'These jeans are made from a durable, dark wash denim and have a relaxed fit through the leg. They feature a classic five-pocket design and a straight leg silhouette.',
       'This t-shirt is made from a soft, breathable cotton fabric and features a crew neck and short sleeves. It has a graphic print on the front and a relaxed fit.',
       'This nightgown is made from a soft, lightweight cotton fabric and features a relaxed fit and a V-neckline. It has a delicate floral print and falls below the knee.',
       'This swimsuit is made from a chlorine-resistant, quick-drying fabric and features a flattering one-piece design with adjustable straps. It has a built-in bra for support and a moderate leg cut.',
       'This swimsuit is made from a chlorine-resistant, quick-drying fabric and features a flattering one-piece design with adjustable straps. It has a built-in bra for support and a moderate leg cut.'])

hypotheses = pd.DataFrame(['These jeans will provide a comfortable fit with some room to move.',
       'The dark wash denim makes these jeans suitable for more casual occasions.',
       'The soft cotton fabric makes this t-shirt comfortable for everyday wear.',
       'This nightgown is shorter than expected and does not provide adequate coverage.',
       'The chlorine-resistant fabric fades quickly and loses its color after a few uses.',
       'The adjustable straps on this swimsuit are difficult to adjust and do not stay in place.'])

nli_labels = pd.DataFrame(['entailment', 'entailment', 'contradiction', 'contradiction', 'contradiction'])

data = pd.concat([premises, hypotheses, nli_labels], axis = 1)
data.columns = ['premise', 'hypothesis', 'label']

label_mapping = {'entailment': 0, 'contradiction': 1}
data['label'] = data['label'].map(label_mapping)

target_columns = list(data.columns)[2:]
labels = data[target_columns].values

tokens = bert_tokenizer.batch_encode_plus(data[['premise', 'hypothesis']].values.tolist(),
                                          max_length = 100,
                                          truncation = True,
                                          padding = 'max_length',
                                          add_special_tokens = True,
                                          return_token_type_ids = True,
                                          return_attention_mask = True,
                                          return_tensors = 'tf')

input_ids = np.array(tokens['input_ids'], dtype = int)
attention_mask = np.array(tokens['attention_mask'], dtype = int)
token_type_ids = np.array(tokens['token_type_ids'], dtype = int)

dataset = tf.data.Dataset.from_tensor_slices({
    'input_ids': input_ids,
    'attention_mask': attention_mask,
    'token_type_ids': token_type_ids,
    'labels': labels,
})

def map_func(data):
  return (tf.cast(data['input_ids'], dtype=tf.int32),
          tf.cast(data['attention_mask'], dtype=tf.int32),
          tf.cast(data['token_type_ids'], dtype=tf.int32)), data['labels']

dataset = dataset.map(map_func)

dataset_size = tf.data.experimental.cardinality(dataset).numpy()  
train_size = int(0.8 * dataset_size)
val_size = int(0.1 * dataset_size)
test_size = dataset_size - train_size - val_size
dataset = dataset.shuffle(buffer_size=dataset_size)  
tf.random.set_seed(1234)  
train_dataset = dataset.take(train_size)
val_dataset = dataset.skip(train_size).take(val_size)
test_dataset = dataset.skip(train_size + val_size)

input_ids_k_tensor = tf.keras.Input(shape=(100,), dtype=tf.int32, name='input_ids')
attention_mask_k_tensor = tf.keras.Input(shape=(100,), dtype=tf.int32, name='attention_mask')
token_type_ids_k_tensor = tf.keras.Input(shape=(100,), dtype=tf.int32, name='token_type_ids')

embeddings = bert.layers[0](input_ids_k_tensor, attention_mask_k_tensor, token_type_ids_k_tensor)['last_hidden_state']

Generates this error, appearing to be related to the input tensor type...

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-42-5299b84ca93f>](https://localhost:8080/#) in <cell line: 5>()
      3 token_type_ids_k_tensor = tf.keras.Input(shape=(100,), dtype=tf.int32, name='token_type_ids')
      4 
----> 5 embeddings = bert.layers[0](input_ids_k_tensor, attention_mask_k_tensor, token_type_ids_k_tensor)['last_hidden_state']

2 frames
[/usr/local/lib/python3.10/dist-packages/tf_keras/src/utils/traceback_utils.py](https://localhost:8080/#) in error_handler(*args, **kwargs)
     68             # To get the full stack trace, call:
     69             # `tf.debugging.disable_traceback_filtering()`
---> 70             raise e.with_traceback(filtered_tb) from None
     71         finally:
     72             del filtered_tb

[/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_utils.py](https://localhost:8080/#) in run_call_with_unpacked_inputs(self, *args, **kwargs)
    425             config = self.config
    426 
--> 427         unpacked_inputs = input_processing(func, config, **fn_args_and_kwargs)
    428         return func(self, **unpacked_inputs)
    429 

[/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_utils.py](https://localhost:8080/#) in input_processing(func, config, **kwargs)
    502             output[k] = v
    503         else:
--> 504             raise ValueError(f"Data of type {type(v)} is not allowed only {allowed_types} is accepted for {k}.")
    505 
    506     if isinstance(main_input, (tuple, list)):

ValueError: Exception encountered when calling layer 'bert' (type TFBertMainLayer).

Data of type <class 'keras.src.backend.common.keras_tensor.KerasTensor'> is not allowed only (<class 'tensorflow.python.framework.tensor.Tensor'>, <class 'bool'>, <class 'int'>, <class 'transformers.utils.generic.ModelOutput'>, <class 'tuple'>, <class 'list'>, <class 'dict'>, <class 'numpy.ndarray'>) is accepted for attention_mask.

Call arguments received by layer 'bert' (type TFBertMainLayer):
  • input_ids=<KerasTensor shape=(None, 100), dtype=int32, sparse=None, name=input_ids>
  • attention_mask=<KerasTensor shape=(None, 100), dtype=int32, sparse=None, name=attention_mask>
  • token_type_ids=<KerasTensor shape=(None, 100), dtype=int32, sparse=None, name=token_type_ids>
  • position_ids=None
  • head_mask=None
  • inputs_embeds=None
  • encoder_hidden_states=None
  • encoder_attention_mask=None
  • past_key_values=None
  • use_cache=None
  • output_attentions=None
  • output_hidden_states=None
  • return_dict=None
  • training=False

In the Tensorflow documentation, it would seem that since TFBertModel is of tf.keras.Model class, it should be able to take tf.keras.Input as the input, which I made sure to do here. However, the error suggests that this type of input is not accepted by TFBertModel?

ShubhamModi77 commented 6 months ago

@melissafeeney I think it's TensorFlow version error.keras 3 is compatible with TensorFlow 2.16.

Rocketknight1 commented 6 months ago

Any time you see an error like this:

Data of type <class 'keras.src.backend.common.keras_tensor.KerasTensor'> is not allowed only (<class 'tensorflow.python.framework.tensor.Tensor'>,

It is a sign that a Keras 3 object is being passed to a Keras 2 class that doesn't understand it. As a result of TensorFlow's somewhat chaotic transition, these mixups will happen! We've tried to update transformers so that it correctly sets TF to use Keras 2, but it's still possible to create Keras 3 objects depending on your workflow.

Here's what I suggest as a general solution to these kinds of problems:

First, try installing transformers from main with pip install git+https://github.com/huggingface/transformers.git to see if it fixes the problem. Note that the version on main is newer than the latest release version, 4.38.2. These fixes have not been included in a released version yet!
If it still doesn't fix the problem, try pip install tf-keras
If that still doesn't fix the problem, try setting the environment variable TF_USE_LEGACY_KERAS=1
If that still doesn't fix the problem, check your code and make sure you're using tf.keras and not directly using Keras 3 like import keras or from keras import x

melissafeeney commented 6 months ago

@Rocketknight1 I still couldn't get it to work with the new transformers update from main, but what I did that appears to work:

pip install transformers==4.37.2
using tf_keras 2.15.1
replacing any direct keras imports using from keras import xyz with from tensorflow.keras import xyz

Rocketknight1 commented 6 months ago

@melissafeeney my guess is that the last one of those was the solution! Basically:

Transformers will always use Keras 2 objects, unless you're using an old version that doesn't know about Keras 3 yet.
If you've installed Keras 3, you can still get Keras 2 objects, either by importing them from tf_keras or by setting TF_USE_LEGACY_KERAS=1 and importing them from tf.keras
If you import from keras (not tf.keras), you will always get the current version of Keras you have installed, which will probably be Keras 3 for people using TF 2.15 / TF 2.16!

Therefore, the most likely cause of this issue is either importing directly from keras, or importing from tf.keras without setting TF_USE_LEGACY_KERAS=1. Since tf_keras is a backward compatibility package, it will always have Keras 2 objects, so it should always be safe to import from when it's installed.

dipanjanS commented 5 months ago

Can confirm this, I faced the same issue in one of my implementations with using the TF models for HuggingFace.

The TF_USE_LEGACY_KERAS=1 is definitely the simplest fix without needing to downgrade or change any packages. Thanks for this!

In case anyone is interested just do this.

import os

os.environ['TF_USE_LEGACY_KERAS'] = '1'

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

HoseinNekouei commented 1 month ago

HI, it worked for me with tensorflow==2.15 and transformers==2.37.2

huggingface / transformers