yangheng95 / LCF-ATEPC

codes for paper A Multi-task Learning Model for Chinese-oriented Aspect Polarity Classification and Aspect Term Extraction
MIT License
191 stars 45 forks source link

Possible error in the project code? #4

Closed Ulitochka closed 4 years ago

Ulitochka commented 4 years ago

Hello.

Thanks fot your work.

In your article (https://arxiv.org/pdf/1912.07976v1.pdf) you talk about that you have a different inputs (Figure 5) for 2 models.

But, in fact, there is only one input: [CLS] token_0 token_1 ... token_i [SEP] aspect tokens [SEP] (https://github.com/yangheng95/LCF-ATEPC/blob/master/utils/data_utils.py#L183), because, in this method only text_a was used.

It's a very strange, it turns out that you pass to ATE model information about tokens, which have target classes: [CLS] token_0 token_1 ... token_i [SEP] --> aspect tokens [SEP] <-- I talking about this part.

There are no any masking: https://github.com/yangheng95/LCF-ATEPC/blob/master/model/lcf_atepc.py#L126 attention_mask is input_mask, which mark non-padding postions with 1.

Thus, it becomes clear why only a linear classifier + bert outputs gives such high quality.

We can go further. Suppose we wanted to apply a trained ATE model to new data. We do not know the terms. Therefore, we simply duplicate the sequence itself. Example: [CLS] The bread is top notch as well . [SEP ] The bread is top notch as well . [SEP] If you do this on test data, you will see that quality is not 99% :(

Is it possible that I'm wrong?

yangheng95 commented 4 years ago

Hello, questions are welcome.

First of all, the model does use two types of inputs, one is the input formatted for the BERT-BASE model, and the other is in the format of the BERT-SPC model. The former is used to extract local context information only, while the latter is used to extract global context information and ATE tasks. And the input sequence of BERT-BASE is obtained by truncation from the input of BERT-SPC by the functionget_ids_for_local_context_extractor(self, text_indices)

As for the attention mask, I studied several repositories for name entity recognition (NER) task, such as BERT-NER (because NER is very similar to the ATE task), and I think there is no need to mask any aspect during training, while model evaluation does.

This is my description of your question. If your question is not solved, please feel free to contact me.

Ulitochka commented 4 years ago

Thanks for your response.

But in config file https://github.com/yangheng95/LCF-ATEPC/blob/master/exp-batch.json#L5 the parameter which is controls truncation is False. It means, that there is no truncation.

The format for BERT-SPC model is: [CLS] token_0 token_1 ... token_i [SEP] aspect tokens [SEP] And this model is used to extract global context information and ATE tasks. ATE - aspect terms extraction. You give model, that should extract terms, information about this aim terms in input data, according to the format.

yangheng95 commented 4 years ago

We designed "bert-base" parameter just to implement the bert-base model. When this parameter is True and takes effect, the input to bert-spc is truncated, and with local_context_focus="None", the model is reduced to the bert-base model. At other times, it represents the input of bert-spc, and local context features extractor always takes the input of BERT-BASE. Those inputs are aims at global context and local context, respectively.

Ulitochka commented 4 years ago

duplicate: The format for BERT-SPC model is: [CLS] token_0 token_1 ... token_i [SEP] aspect tokens [SEP] And this model is used to extract global context information and ATE tasks. ATE - aspect terms extraction. You give model, that should extract terms, information about this aim terms in input data, according to the format.

yangheng95 commented 4 years ago

Hello, Seems the code of ATE part need a repair. Have you tried to implement the BERT-BASE model for the ATE task?

When I tried to conduct the ATE task based on the codes BERT-NER, but the F1 score on the Laptop test set only reached about 71-72.

yangheng95 commented 4 years ago

The model temporarily defaults block the input format of BERT-SPC to keep the rationality of ATE performance. However, the BERT-SPC input still could be used to improve the APC subtask. I will redesign the code and update the paper later.