yourh / AttentionXML

Implementation for "AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification"
245 stars 41 forks source link

AttentionXML in production #24

Closed HMM2021 closed 3 years ago

HMM2021 commented 3 years ago

Hi,

I have a question about using AttentionXML model in production. Do we have to use the same tokenizer used for training in POC or we can create another tokenizer and embedding matrix in production?

Thank you in advance

yourh commented 3 years ago

You can create another tokenizer and embedding matrix for your own datasets.

HMM2021 commented 3 years ago

Thank you for your answer. However, when i create another tokenizer and embedding matrix i have the error below:

RuntimeError: Error(s) in loading state_dict for AttentionRNN: size mismatch for emb.emb.weight: copying a param with shape torch.Size([697040, 300]) from checkpoint, the shape in current model is torch.Size([81651, 300]).

HMM2021 commented 3 years ago

this is the whole error:

RuntimeError Traceback (most recent call last)

in 3 model1 = FastAttentionXML(labels_num, data_cnf, model_cnf, '') 4 start_time = time.time() ----> 5 scores, labels_pred = model1.predict(x) 6 finish_time = time.time() 7 print('Predicting finished') /home/hmouzoun/patent_classification/AttentionXML/deepxml/tree.py in predict(self, test_x) 209 210 def predict(self, test_x): --> 211 return self.predict_level(self.level - 1, test_x, self.model_cnf['predict'].get('k', 100), self.labels_num) /home/hmouzoun/patent_classification/AttentionXML/deepxml/tree.py in predict_level(self, level, test_x, k, labels_num) 181 else: 182 groups = self.get_inter_groups(labels_num) --> 183 group_scores, group_labels = self.predict_level(level - 1, test_x, self.top, len(groups)) 184 torch.cuda.empty_cache() 185 logger.info(F'Predicting Level-{level}, Top: {k}') /home/hmouzoun/patent_classification/AttentionXML/deepxml/tree.py in predict_level(self, level, test_x, k, labels_num) 181 else: 182 groups = self.get_inter_groups(labels_num) --> 183 group_scores, group_labels = self.predict_level(level - 1, test_x, self.top, len(groups)) 184 torch.cuda.empty_cache() 185 logger.info(F'Predicting Level-{level}, Top: {k}') /home/hmouzoun/patent_classification/AttentionXML/deepxml/tree.py in predict_level(self, level, test_x, k, labels_num) 181 else: 182 groups = self.get_inter_groups(labels_num) --> 183 group_scores, group_labels = self.predict_level(level - 1, test_x, self.top, len(groups)) 184 torch.cuda.empty_cache() 185 logger.info(F'Predicting Level-{level}, Top: {k}') /home/hmouzoun/patent_classification/AttentionXML/deepxml/tree.py in predict_level(self, level, test_x, k, labels_num) 175 test_loader = DataLoader(MultiLabelDataset(test_x), model_cnf['predict']['batch_size'], 176 num_workers=4) --> 177 return model.predict(test_loader, k=k) 178 else: 179 if level == self.level - 1: /home/hmouzoun/patent_classification/AttentionXML/deepxml/models.py in predict(self, data_loader, k, desc, **kwargs) 88 89 def predict(self, data_loader: DataLoader, k=100, desc='Predict', **kwargs): ---> 90 self.load_model() 91 scores_list, labels_list = zip(*(self.predict_step(data_x, k) 92 for data_x in tqdm(data_loader, desc=desc, leave=False))) /home/hmouzoun/patent_classification/AttentionXML/deepxml/models.py in load_model(self) 97 98 def load_model(self): ---> 99 self.model.module.load_state_dict(torch.load(self.model_path)) 100 101 def clip_gradient(self): /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict) 1049 1050 if len(error_msgs) > 0: -> 1051 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( 1052 self.__class__.__name__, "\n\t".join(error_msgs))) 1053 return _IncompatibleKeys(missing_keys, unexpected_keys) RuntimeError: Error(s) in loading state_dict for AttentionRNN: size mismatch for emb.emb.weight: copying a param with shape torch.Size([697040, 300]) from checkpoint, the shape in current model is torch.Size([81651, 300]).
yourh commented 3 years ago

I think maybe I misunderstanded your question. The tokenizer and embedding matrix should be consistent for training and prediction. You need to retrain the model with your own tokenizer and embedding matrix.

HMM2021 commented 3 years ago

Ok i got it Thank you so much