Closed Ulitochka closed 4 years ago
Hello, questions are welcome.
First of all, the model does use two types of inputs, one is the input formatted for the BERT-BASE model, and the other is in the format of the BERT-SPC model. The former is used to extract local context information only, while the latter is used to extract global context information and ATE tasks. And the input sequence of BERT-BASE is obtained by truncation from the input of BERT-SPC by the functionget_ids_for_local_context_extractor(self, text_indices)
As for the attention mask, I studied several repositories for name entity recognition (NER) task, such as BERT-NER (because NER is very similar to the ATE task), and I think there is no need to mask any aspect during training, while model evaluation does.
This is my description of your question. If your question is not solved, please feel free to contact me.
Thanks for your response.
But in config file https://github.com/yangheng95/LCF-ATEPC/blob/master/exp-batch.json#L5 the parameter which is controls truncation is False. It means, that there is no truncation.
The format for BERT-SPC model is: [CLS] token_0 token_1 ... token_i [SEP] aspect tokens [SEP] And this model is used to extract global context information and ATE tasks. ATE - aspect terms extraction. You give model, that should extract terms, information about this aim terms in input data, according to the format.
We designed "bert-base" parameter just to implement the bert-base model. When this parameter is True and takes effect, the input to bert-spc is truncated, and with local_context_focus="None", the model is reduced to the bert-base model. At other times, it represents the input of bert-spc, and local context features extractor always takes the input of BERT-BASE. Those inputs are aims at global context and local context, respectively.
duplicate: The format for BERT-SPC model is: [CLS] token_0 token_1 ... token_i [SEP] aspect tokens [SEP] And this model is used to extract global context information and ATE tasks. ATE - aspect terms extraction. You give model, that should extract terms, information about this aim terms in input data, according to the format.
Hello, Seems the code of ATE part need a repair. Have you tried to implement the BERT-BASE model for the ATE task?
When I tried to conduct the ATE task based on the codes BERT-NER, but the F1 score on the Laptop test set only reached about 71-72.
The model temporarily defaults block the input format of BERT-SPC to keep the rationality of ATE performance. However, the BERT-SPC input still could be used to improve the APC subtask. I will redesign the code and update the paper later.
Hello.
Thanks fot your work.
In your article (https://arxiv.org/pdf/1912.07976v1.pdf) you talk about that you have a different inputs (Figure 5) for 2 models.
But, in fact, there is only one input: [CLS] token_0 token_1 ... token_i [SEP] aspect tokens [SEP] (https://github.com/yangheng95/LCF-ATEPC/blob/master/utils/data_utils.py#L183), because, in this method only text_a was used.
It's a very strange, it turns out that you pass to ATE model information about tokens, which have target classes: [CLS] token_0 token_1 ... token_i [SEP] --> aspect tokens [SEP] <-- I talking about this part.
There are no any masking: https://github.com/yangheng95/LCF-ATEPC/blob/master/model/lcf_atepc.py#L126 attention_mask is input_mask, which mark non-padding postions with 1.
Thus, it becomes clear why only a linear classifier + bert outputs gives such high quality.
We can go further. Suppose we wanted to apply a trained ATE model to new data. We do not know the terms. Therefore, we simply duplicate the sequence itself. Example: [CLS] The bread is top notch as well . [SEP ] The bread is top notch as well . [SEP] If you do this on test data, you will see that quality is not 99% :(
Is it possible that I'm wrong?