Token indices sequence length issue.

SOVIETIC-BOSS88 commented 1 year ago

Hi, I am having the following problem when I am using my library on my dataset. The dataset has only 209 samples, but 107 features. The values in the set are floats and ints.

This is the call I am making: model = GReaT(llm='gpt2', epochs=50, batch_size=32) model.fit(df)

This is what I assume is the reason behind the error: Token indices sequence length is longer than the specified maximum sequence length for this model (1614 > 1024). Running this sequence through the model will result in indexing errors

From what I can gather it seems a Huggingface issue. Is there a way to pass the seq = seq[:512] parameter?

Do you know a solution to this problem?

Any help would be much appreciated.

Here is the full trace:

/usr/local/lib/python3.9/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
***** Running training *****
  Num examples = 209
  Num Epochs = 50
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 350
  Number of trainable parameters = 124439808
Token indices sequence length is longer than the specified maximum sequence length for this model (1614 > 1024). Running this sequence through the model will result in indexing errors
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [42,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [2,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [11,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [7], line 8
      5 #data
      6 #coul_sum_0_0v_sc
      7 model = GReaT(llm='gpt2', epochs=50, batch_size=32)
----> 8 model.fit(coul_sum_0_0v_sc)
      9 #synthetic_data = model.sample(n_samples=100)

File /usr/local/lib/python3.9/dist-packages/be_great/great.py:114, in GReaT.fit(self, data, column_names, conditional_col, resume_from_checkpoint)
    112 # Start training
    113 logging.info("Start training...")
--> 114 great_trainer.train(resume_from_checkpoint=resume_from_checkpoint)
    115 return great_trainer

File /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:1501, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1496     self.model_wrapped = self.model
   1498 inner_training_loop = find_executable_batch_size(
   1499     self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
   1500 )
-> 1501 return inner_training_loop(
   1502     args=args,
   1503     resume_from_checkpoint=resume_from_checkpoint,
   1504     trial=trial,
   1505     ignore_keys_for_eval=ignore_keys_for_eval,
   1506 )

File /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:1749, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1747         tr_loss_step = self.training_step(model, inputs)
   1748 else:
-> 1749     tr_loss_step = self.training_step(model, inputs)
   1751 if (
   1752     args.logging_nan_inf_filter
   1753     and not is_torch_tpu_available()
   1754     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   1755 ):
   1756     # if loss is nan or inf simply add the average of previous logged losses
   1757     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:2508, in Trainer.training_step(self, model, inputs)
   2505     return loss_mb.reduce_mean().detach().to(self.args.device)
   2507 with self.compute_loss_context_manager():
-> 2508     loss = self.compute_loss(model, inputs)
   2510 if self.args.n_gpu > 1:
   2511     loss = loss.mean()  # mean() to average on multi-gpu parallel training

File /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:2540, in Trainer.compute_loss(self, model, inputs, return_outputs)
   2538 else:
   2539     labels = None
-> 2540 outputs = model(**inputs)
   2541 # Save past state if it exists
   2542 # TODO: this needs to be fixed and made cleaner later.
   2543 if self.args.past_index >= 0:

File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.9/dist-packages/transformers/models/gpt2/modeling_gpt2.py:1046, in GPT2LMHeadModel.forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, labels, use_cache, output_attentions, output_hidden_states, return_dict)
   1038 r"""
   1039 labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
   1040     Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
   1041     `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`
   1042     are ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]`
   1043 """
   1044 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 1046 transformer_outputs = self.transformer(
   1047     input_ids,
   1048     past_key_values=past_key_values,
   1049     attention_mask=attention_mask,
   1050     token_type_ids=token_type_ids,
   1051     position_ids=position_ids,
   1052     head_mask=head_mask,
   1053     inputs_embeds=inputs_embeds,
   1054     encoder_hidden_states=encoder_hidden_states,
   1055     encoder_attention_mask=encoder_attention_mask,
   1056     use_cache=use_cache,
   1057     output_attentions=output_attentions,
   1058     output_hidden_states=output_hidden_states,
   1059     return_dict=return_dict,
   1060 )
   1061 hidden_states = transformer_outputs[0]
   1063 # Set device for model parallelism

File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.9/dist-packages/transformers/models/gpt2/modeling_gpt2.py:889, in GPT2Model.forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions, output_hidden_states, return_dict)
    879     outputs = torch.utils.checkpoint.checkpoint(
    880         create_custom_forward(block),
    881         hidden_states,
   (...)
    886         encoder_attention_mask,
    887     )
    888 else:
--> 889     outputs = block(
    890         hidden_states,
    891         layer_past=layer_past,
    892         attention_mask=attention_mask,
    893         head_mask=head_mask[i],
    894         encoder_hidden_states=encoder_hidden_states,
    895         encoder_attention_mask=encoder_attention_mask,
    896         use_cache=use_cache,
    897         output_attentions=output_attentions,
    898     )
    900 hidden_states = outputs[0]
    901 if use_cache is True:

File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.9/dist-packages/transformers/models/gpt2/modeling_gpt2.py:389, in GPT2Block.forward(self, hidden_states, layer_past, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions)
    387 residual = hidden_states
    388 hidden_states = self.ln_1(hidden_states)
--> 389 attn_outputs = self.attn(
    390     hidden_states,
    391     layer_past=layer_past,
    392     attention_mask=attention_mask,
    393     head_mask=head_mask,
    394     use_cache=use_cache,
    395     output_attentions=output_attentions,
    396 )
    397 attn_output = attn_outputs[0]  # output_attn: a, present, (attentions)
    398 outputs = attn_outputs[1:]

File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.9/dist-packages/transformers/models/gpt2/modeling_gpt2.py:311, in GPT2Attention.forward(self, hidden_states, layer_past, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions)
    309     attention_mask = encoder_attention_mask
    310 else:
--> 311     query, key, value = self.c_attn(hidden_states).split(self.split_size, dim=2)
    313 query = self._split_heads(query, self.num_heads, self.head_dim)
    314 key = self._split_heads(key, self.num_heads, self.head_dim)

File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.9/dist-packages/transformers/pytorch_utils.py:112, in Conv1D.forward(self, x)
    110 def forward(self, x):
    111     size_out = x.size()[:-1] + (self.nf,)
--> 112     x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight)
    113     x = x.view(size_out)
    114     return x

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

unnir commented 1 year ago

Hi,

Thank you for reporting the issue! My apologies for issues you encountered with our framework.

Token indices sequence length is longer than the specified maximum sequence length for this model (1614 > 1024). Running this sequence through the model will result in indexing errors

Yes, since some of the popular Transformer-based models are trained using with the max sequence length of 1024.

Is there a way to pass the seq = seq[:512] parameter?

This way will lead to a problem where GReaT won't be able to synthesize data for all features.

In my opinion, there are two possible ways to deal with this problem:

(1) You can change the n_positions hyperparameter (max token numbers) https://huggingface.co/transformers/v2.10.0/model_doc/gpt2.html

However, in this case, you will have mismatch between some of the weights, which is Okay because you still need to finetune the model.

In order to change the n_positions, you need to adjust the GReaT code. Please add the following hyperparameters togreat.py (a line 63), a possible location of the file: /usr/local/lib/python3.9/dist-packages/be_great/great.py

self.model = AutoModelForCausalLM.from_pretrained(self.llm, n_positions=2048, ignore_mismatched_sizes=True)

After the fine tuning step, you need to pass max_length parameter to the sample function:

synthetic_data = model.sample(n_samples=100, max_length=1700)

Disclaimer, we haven't tested it, so can guarantee that it will work.

(2) You can also adjust your dataset, to make the input sequence shorter after the tokenization step. For example:

round floats (e.g., 1.21412423 -> 1.21)
shorter feature names (e.g., age_of_patient -> Age)
remove not informative columns

Please let me know if this helps! We will also adjust our framework for long sequences in the next release.

SOVIETIC-BOSS88 commented 1 year ago

Thank you very much for the suggestions, and apologies for my late reply. Yesterday, I tried both suggestions. I modified the great.py source file, and also made the input seq. shorter. I also used the

df.astype('float16')

line of code, and reduced the batch size to 4, otherwise I did run into CUDA out of memory errors, like he following one:

OutOfMemoryError: CUDA out of memory. Tried to allocate 342.00 MiB (GPU 0; 14.76 GiB total capacity; 13.33 GiB already allocated; 113.75 MiB free; 13.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

With these specifications I was able to start the training of the model. I did not complete the training though, but tomorrow I will be able to confirm 100%.

Today, I forked the library in order to downgrade the packages, and test if it was possible to use the library with Python 3.7. For now I was not able to do it. Please ignore the erroneous pull request.

Cheers.

unnir commented 1 year ago

great!

It's really interesting to hear about your results. I would appreciate an update!

SOVIETIC-BOSS88 commented 1 year ago

Some updates regarding the model training.

1) I tried to produce samples using the saved trained model, and the modified forked library. The model did load correctly. The issue was that when I called the sample function it got stuck. After 19 mins. I stopped the execution. Here is how I called the method.

synthetic_data = model.sample(n_samples=20)

2) For this reason I started experimenting and trained 4 different models with the 2 versions of the library (2 different environments). I used the california housing dataset, and a reduced version of the previous personal dataset (209 rows × 20 columns).

I called the model function the following way:

model = GReaT(llm='distilgpt2', batch_size=32, epochs=20)

2.1) The california dataset: did train and produce samples without any issues using both the original library and the modified one.

2.2) Reduced personal dataset.

2.2.a) Trained using the modified library: it does train with no errors, but as before gets stuck during inference.

2.2.b) Trained using the original library

Original version of the library: it does train without errors, but during inference time I got the following error. I am puzzled, since the dataset I am using is 209 rows × 20 columns.

Here is the full trace.

IndexError Traceback (most recent call last) Cell In[18], line 1 ----> 1 synthetic_data = model.sample(n_samples=20)

File ~/anaconda3/envs/env/lib/python3.9/site-packages/be_great/great.py:162, in GReaT.sample(self, n_samples, start_col, start_col_dist, temperature, k, max_length, device) 160 # Convert tokens back to tabular data 161 text_data = _convert_tokens_to_text(tokens, self.tokenizer) --> 162 df_gen = _convert_text_to_tabular_data(text_data, df_gen) 164 # Remove rows with flawed numerical values 165 for i_num_cols in self.num_cols:

File ~/anaconda3/envs/env/lib/python3.9/site-packages/be_great/great_utils.py:91, in _convert_text_to_tabular_data(text, df_gen) 89 values = f.strip().split(" is ") 90 if values[0] in columns and not td[values[0]]: ---> 91 td[values[0]] = [values[1]] 93 df_gen = pd.concat([df_gen, pd.DataFrame(td)], ignore_index=True, axis=0) 94 return df_gen

IndexError: list index out of range

kathrinse commented 1 year ago

Hey, both issues can happen if the model was not fine-tuned long enough.

I updated the code recently to handle the index error, you can pull the newest version to fix this.

But this does not fix the underlying issue - for me it seems like the model did not yet learn to generate all of the 20 columns correctly. In the sample-method there are some sanity checks to remove rows with corrupted or missing values (this happens usually in significantly less than 5% of the generated data). But if the model is not able to generate all columns correctly, it can result in an endless loop.

Maybe you can have a look at the textual output (text_data) to understand your problem further.

SOVIETIC-BOSS88 commented 1 year ago

Hi, thank you for the update. I was running some experiments during these past weeks, and wanted to be 100% sure, before replying.

First, I started with the dataset composed of 20 columns. I started increasing the number of training epochs from 50 up to 300, in 50 epoch steps. I am still getting the index error.

Second, did experiments, where I kept the number of epochs constant, but increasing the passed number of features. I fine tuned the model from 4 features up to 9 features, with only 50 epochs. The model produced samples that were logical.

The problems started again when I increased the number to 10 features. I started with 50 epochs. and went up to 300 epochs, in 50 epoch steps. Still the index problem persisted. I tried the same experiments with another dataset of similar size, but no success either.

Will keep experimenting and will update you on my progress.

kathrinse / be_great

Token indices sequence length issue. #6