IndexError: list index out of range | ATEPC English training on Tshirt dataset

hitz02 commented 3 years ago

Out-of-range error while training ATEPC model - english on T-shirt dataset.

... config.model = ATEPCModelList.LCFS_ATEPC config.evaluate_begin = 5 config.num_epoch = 6 config.log_step = 100 tshirt = ABSADatasetList.TShirt

aspect_extractor = Trainer(config=config, dataset=tshirt, checkpoint_save_mode=1, auto_device=True )

Traceback - >

TShirt dataset is not found locally, search at https://github.com/yangheng95/ABSADatasets Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Using bos_token, but it is not set yet. Using eos_token, but it is not set yet. 59%|█████▊ | 1098/1870 [00:10<00:07, 100.65it/s, convert examples to features]

IndexError Traceback (most recent call last)
in () 3 # from_checkpoint=checkpoint_path, 4 checkpoint_save_mode=1, ----> 5 auto_device=True 6 )

7 frames /usr/local/lib/python3.7/dist-packages/pyabsa/functional/trainer/trainer.py in init(self, config, dataset, from_checkpoint, checkpoint_save_mode, auto_device) 92 config.model_path_to_save = None 93 ---> 94 self.train() 95 96 def train(self):

/usr/local/lib/python3.7/dist-packages/pyabsa/functional/trainer/trainer.py in train(self) 103 self.config.seed = s 104 if self.checkpoint_save_mode: --> 105 model_path.append(self.train_func(self.config, self.from_checkpoint, self.logger)) 106 else: 107 # always return the last trained model if dont save trained model

/usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/training/atepc_trainer.py in train4atepc(opt, from_checkpoint_path, logger) 352 while not trainer: 353 try: --> 354 trainer = Instructor(opt, logger) 355 if from_checkpoint_path: 356 model_path = find_files(from_checkpoint_path, '.model')

/usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/training/atepc_trainer.py in init(self, opt, logger) 70 len(self.train_examples) / self.opt.batch_size / self.opt.gradient_accumulation_steps) * self.opt.num_epoch 71 train_features = convert_examples_to_features(self.train_examples, self.label_list, self.opt.max_seq_len, ---> 72 self.tokenizer, self.opt) 73 all_spc_input_ids = torch.tensor([f.input_ids_spc for f in train_features], dtype=torch.long) 74 all_input_mask = torch.tensor([f.input_mask for f in train_features], dtype=torch.long)

/usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/dataset_utils/data_utils_for_training.py in convert_examples_to_features(examples, label_list, max_seq_len, tokenizer, opt) 188 text_right = '' 189 aspect = '' --> 190 prepared_inputs = prepare_input_for_atepc(opt, tokenizer, text_left, text_right, aspect) 191 lcf_cdm_vec = prepared_inputs['lcf_cdm_vec'] 192 lcf_cdw_vec = prepared_inputs['lcf_cdw_vec']

/usr/local/lib/python3.7/dist-packages/pyabsa/core/atepc/dataset_utils/atepc_utils.py in prepare_input_for_atepc(opt, tokenizer, text_left, text_right, aspect) 60 61 if 'lcfs' in opt.model_name or opt.use_syntax_based_SRD: ---> 62 syntacticaldist, = get_syntax_distance(text_raw, aspect, tokenizer, opt) 63 else: 64 syntactical_dist = None

/usr/local/lib/python3.7/dist-packages/pyabsa/core/apc/dataset_utils/apc_utils.py in get_syntax_distance(text_raw, aspect, tokenizer, opt) 240 # the following two functions are both designed to calculate syntax-based distances 241 if opt.srd_alignment: --> 242 syntactical_dist = syntax_distance_alignment(raw_tokens, dist, opt.max_seq_len, tokenizer) 243 else: 244 syntactical_dist = pad_syntax_based_srd(raw_tokens, dist, tokenizer, opt)[1]

/usr/local/lib/python3.7/dist-packages/pyabsa/core/apc/dataset_utils/apc_utils.py in syntax_distance_alignment(tokens, dist, max_seq_len, tokenizer) 38 if bert_tokens != text: 39 while text or bert_tokens: ---> 40 if text[0] == ' ' or text[0] == '\xa0': # bad case handle 41 text = text[1:] 42 dep_dist = dep_dist[1:]

IndexError: list index out of range

yangheng95 commented 3 years ago

I encounter same error in colab, you are probably using1.1.7, still. I think you should restart colab and clear the environ. You may not really update pyabsa Until you see installation of transformers.

XuMayi commented 3 years ago

This problem is conducted by Colab environment, we recommend that you restart Colab by ending the current session.

yangheng95 / PyABSA

IndexError: list index out of range | ATEPC English training on Tshirt dataset #59

aspect_extractor = Trainer(config=config, dataset=tshirt, checkpoint_save_mode=1, auto_device=True )

Traceback - >