huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.04k stars 27.02k forks source link

when l use the model generate method within @tf.function, it encounterd a mistake, but my logic was fine. #33229

Closed HelloWorldU closed 1 month ago

HelloWorldU commented 2 months ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

''' def train_generator_step(self, input_ids, attention_mask, labels, styles, max_len, step, accumulation_steps=4, lambda_rec=1.0, lambda_lm=1.0, lambda_adv=1.0, lambda_kl=1.0, gamma=1.0):

    max_len = tf.constant(max_len, dtype=tf.int32)
    max_len_value = max_len
    seq_len = input_ids.shape[2]
    max_new_tokens = tf.maximum(max_len_value - seq_len - 10, 1)
    max_new_tokens = tf.cast(max_new_tokens, tf.int32)
    max_new_tokens = tf.maximum(max_new_tokens, 1)
    max_new_tokens = tf.constant(max_new_tokens, dtype=tf.int32)

    @tf.function
    def step_fn(input_ids=input_ids, attention_mask=attention_mask, labels=labels, styles=styles, accumulation_steps=accumulation_steps, 
                lambda_rec=lambda_rec, lambda_lm=lambda_lm, lambda_adv=lambda_adv, lambda_kl=lambda_kl, gamma=gamma):
        with tf.GradientTape() as tape:
            tf.debugging.enable_check_numerics()

            accumulation_steps, lambda_rec, lambda_lm, lambda_adv, lambda_kl, gamma = pr.conv_tensor_to_float(accumulation_steps, lambda_rec, lambda_lm, lambda_adv, lambda_kl, gamma)

            epsilon = 1e-6 # 快速修复

            """
            we firstly to reshape the input
            """
            actual_shape = tf.shape(input_ids)
            input_ids = tf.reshape(input_ids, (actual_shape[0] * actual_shape[1], actual_shape[2]))
            attention_mask = tf.reshape(attention_mask, (actual_shape[0] * actual_shape[1], actual_shape[2]))

            """
            then, we repeat styles and labels
            """
            styles = tf.repeat(styles, repeats=actual_shape[0])
            labels = tf.repeat(labels, repeats=actual_shape[0], axis=0)

            # 嵌入风格标签
            style_embeddings = self.embedding(styles) # [num_devices * batch_size, n_embd]
            print("Style embeddings shape:", style_embeddings.shape)  # Debug info

            # 将输入 ID 嵌入到相同的嵌入空间
            input_embeddings = self.gen.transformer.wte(input_ids) # [num_devices * batch_size, seq_len, n_embd]
            print("Input embeddings shape:", input_embeddings.shape)  # Debug info

            extended_input_embeddings = input_embeddings + tf.expand_dims(style_embeddings, axis=1)
            print("Extended embeddings shape:", extended_input_embeddings.shape)  # Debug info

            input_ids, attention_mask, labels, styles = dis.convert_tensor(input_ids, attention_mask, labels, styles)

            outputs = self.gen(input_ids=input_ids, attention_mask=attention_mask, training=True)
            logits = outputs.logits
            print("Logits shape:", logits.shape)  # Debug info
            print("Logits dtype:", logits.dtype)  # Debug info
            print("labels shape:", labels.shape)  # Debug info

            loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
            mask = tf.cast(labels != -100, logits.dtype)
            print("Mask shape:", mask.shape)  # Debug info
            print("Mask dtype:", mask.dtype)  # Debug info

            # Check for NaN or Inf in logits
            tf.debugging.check_numerics(logits, "Logits contain NaN or Inf")

            # 各个损失
            rec_loss = loss_fn(tf.where(labels == -100, tf.zeros_like(labels, dtype=logits.dtype), tf.cast(labels, logits.dtype)), logits)
            rec_loss = tf.reduce_sum(rec_loss * mask) / (tf.reduce_sum(mask) + epsilon)
            rec_loss = tf.cast(rec_loss, tf.float32)
            print("Reconstruction loss:", rec_loss)  # Debug info

            for var in self.gen.trainable_variables:
                tf.debugging.check_numerics(var, message="Model weight check")

            print("Input shapes:", 
                    "input_ids:", input_ids.shape, 
                    "input_ids dtype:", input_ids.dtype,
                    "attention_mask:", attention_mask.shape, 
                    "labels:", labels.shape, 
                    "styles:", styles.shape,
                    "max_len_value:", max_len_value)

            new_shape = tf.shape(input_ids)
            print("New shape:", new_shape)  # Debug info
            print("Seq len:", seq_len)  # Debug info

            # max_new_tokens = tf.maximum(max_new_tokens, 1)
            print("Max length:", max_len_value)  # Debug info
            print("Max new tokens:", max_new_tokens)  # Debug info
            if isinstance(max_new_tokens, tf.Tensor):
                print("Max new tokens:", tf.get_static_value(max_new_tokens))  # Debug info
            batch_size = new_shape[0]

            # 扩展 input_ids
            """
            at here, we need to padding to -> [batch_size, max_new_tokens]
            """
            padding = tf.zeros((batch_size, max_new_tokens), dtype=input_ids.dtype)
            print("Padding shape:", padding.shape)  # Debug info

            extended_input_ids = tf.concat([input_ids, padding], axis=1)
            extended_attention_mask1 = tf.concat([attention_mask, tf.zeros((tf.shape(attention_mask)[0],
                                        max_new_tokens), dtype=attention_mask.dtype)], axis=1)

            extended_input_ids = tf.cast(extended_input_ids, tf.int32)
            extended_attention_mask1 = tf.cast(extended_attention_mask1, tf.float32)

            print(f"Extended input_ids shape: {extended_input_ids.shape}")
            print(f"Extended attention_mask shape: {extended_attention_mask1.shape}")

            # 确保最大长度大于最小长度
            max_length = max_len_value + max_new_tokens
            min_length = 1
            # tf.print("max_length:", max_length)
            # tf.print("min_length:", min_length)

            tf.debugging.assert_greater(max_length, min_length, message=f"max_length ({max_length}) must be greater than min_length ({min_length})")

            pad_token_id = int(self.tokenizer.pad_token_id)
            eos_token_id = int(self.tokenizer.eos_token_id)
            bos_token_id = int(self.tokenizer.bos_token_id)

            try:                    
                generated_ids = self.gen.generate(
                    extended_input_ids, 
                    attention_mask=extended_attention_mask1, 
                    max_new_tokens=max_new_tokens,
                    pad_token_id=pad_token_id,
                    eos_token_id=eos_token_id,
                    bos_token_id=bos_token_id,
                    # use_cache=True,
                    # num_beams=1,  # 使用贪婪搜索
                    do_sample=False,  # 不使用采样
                    # temperature=1.0,  # 降低随机性
                )
                print("Generation successful. Generated IDs shape:", generated_ids.shape)

            except Exception as e:
                print(f"Error during generation: {e}")
                print(f"input_ids shape: {input_ids.shape}")
                print(f"attention_mask shape: {attention_mask.shape}")
                print(f"max_len_value: {max_len_value}")
                raise

'''

""" Relevant Message """

Training gen Style embeddings shape: (4, 768) Input embeddings shape: (4, 72, 768) Extended embeddings shape: (4, 72, 768) Logits shape: (4, 72, 21128) Logits dtype: <dtype: 'float16'> labels shape: (4, 72) Mask shape: (4, 72) Mask dtype: <dtype: 'float16'> Reconstruction loss: Tensor("Cast_3:0", shape=(), dtype=float32) Input shapes: input_ids: (4, 72) input_ids dtype: <dtype: 'int32'> attention_mask: (4, 72) labels: (4, 72) styles: (4,) max_len_value: tf.Tensor(125, shape=(), dtype=int32) New shape: Tensor("Shape_1:0", shape=(2,), dtype=int32) Seq len: 72 Max length: tf.Tensor(125, shape=(), dtype=int32) Max new tokens: tf.Tensor(43, shape=(), dtype=int32) Max new tokens: 43 Padding shape: (4, 43) Extended input_ids shape: (4, 115) Extended attention_mask shape: (4, 115) /root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:377: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use and modify the model generation configuration (see https://huggingface.co/docs/transformers/generation_strategies#default-text-generation-configuration ) return py_builtins.overload_of(f)(args) Error during generation: max_new_tokens must be greater than 0, but is 43. input_ids shape: (4, 72) attention_mask shape: (4, 72) max_len_value: 125 Traceback (most recent call last): File "train.py", line 530, in train_model.train(train_tf_dataset_X, train_tf_dataset_Y, valid_tf_dataset_X, valid_tf_dataset_Y, trainconfig.epochs) File "train.py", line 306, in train rec_loss, lm_loss, adv_loss, kl_loss, current_lr, accuracy, total_gen_loss = self.distributed_train_generator_step( File "train.py", line 138, in distributed_train_generator_step loss, rec_loss, lm_loss, adv_loss, kl_loss, current_lr, accuracy = self.strategy.run( File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1316, in run return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs) File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2892, in call_for_each_replica return self._call_for_each_replica(fn, args, kwargs) File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 677, in _call_for_each_replica return mirrored_run.call_for_each_replica( File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_run.py", line 104, in call_for_each_replica return _call_for_each_replica(strategy, fn, args, kwargs) File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_run.py", line 246, in _call_for_each_replica coord.join(threads) File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(self._exc_info_to_raise) File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/six.py", line 719, in reraise raise value File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception yield File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_run.py", line 346, in run self.main_result = self.main_fn(*self.main_args, self.main_kwargs) File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 601, in wrapper return func(*args, *kwargs) File "train.py", line 133, in generator_step loss, rec_loss, lm_loss, adv_loss, kl_loss, current_lr, accuracy, gradients = self.model.train_generator_step(args, kwargs) File "/root/autodl-tmp/model/model.py", line 302, in train_generator_step step_total_loss, step_rec_loss, step_lm_loss, step_adv_loss, step_kl_loss, step_gradients, step_accuracy = step_fn( File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 1129, in autograph_handler raise e.ag_error_metadata.to_exception(e) ValueError: in user code:

File "/root/autodl-tmp/model/model.py", line 220, in step_fn  *
    generated_ids = self.gen.generate(
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/transformers/generation/tf_utils.py", line 738, in generate  *
    model_kwargs = generation_config.update(**kwargs)  # All unused kwargs must be model kwargs
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/transformers/generation/configuration_utils.py", line 1207, in update  *
    self.validate()
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/transformers/generation/configuration_utils.py", line 544, in validate  *
    raise ValueError(f"`max_new_tokens` must be greater than 0, but is {self.max_new_tokens}.")

ValueError: `max_new_tokens` must be greater than 0, but is 43.

""" Definetely, l occured this mistake within @tf.function, and there is no logical mistake when l dubug my code under eager-excution model, similarly, when i use max_length and min_length paramters, it would be occured to "ValueError: max_length must be greater than min_length, 1 is larger than 128.", like this. But, when l set the paramter"max_new_tokens" as a constant value like 50, it would be fine, l donno what leads this, and debug this for at least 20 times. """

Expected behavior

Of course, the value of my variable is dynamic, but I have already defined it outside the graph and used it as a parameter. My expected behavior should be 43 as max_new_token, but it reported an error.

HelloWorldU commented 2 months ago

Sorry, I will continue to provide additional information. I dynamically obtain the maximum length of a sentence based on each batch of the dataset

HelloWorldU commented 2 months ago

@SunMarc @qubvel @ArthurZucker guys, could you take a look about my issues, thanks a lot

HelloWorldU commented 2 months ago

@LysandreJik plz take a look about this,thank you, i guess may be the issue about static graph,but i can't be sure

LysandreJik commented 2 months ago

cc @gante in case you have the bandwidth to take a look at this generate issue

HelloWorldU commented 2 months ago

😭, l guess the tensorflow fault.

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.