yangheng95 / LCF-ATEPC

codes for paper A Multi-task Learning Model for Chinese-oriented Aspect Polarity Classification and Aspect Term Extraction
MIT License
193 stars 45 forks source link

AE affects APC #26

Open yassmine-lam opened 3 years ago

yassmine-lam commented 3 years ago

Hi,

thank u for sharing ur code with us. As I understand, the results of APC are affected by those of AE aren t they ? you use the extracted aspect terms to identify the sentiment polarities instead of using gold terms but what if the results of AE are very low and they hardly affect the APC performance?

Thank u

yangheng95 commented 3 years ago

Yes, but the impact on apc should be limited. This is an emprical conclusion and you can conduct experiments if you want.

yassmine-lam commented 3 years ago

Thank u for ur reply

I tested this model with a dataset in another language than English and Chinese. When I used the multilingual bert model I achieved high results, but when I used a monolingual model, I obtained very low results (F1-score = 0 for ATE task !!!), which is very weird. Normally the monolingual models are better than multilingual models as they have a larger number of vocabularies no? Do u have any idea plz?

thank u

yangheng95 commented 3 years ago

Which pretrained model dou use and can you share any visualization of this preoblem (e.g., code block)?

yangheng95 commented 3 years ago

Note that this repo is hard coded to use BERTPretrainedModel and tokenizer, you may need to alter to use AutoModel and autotokenizer instead.

yassmine-lam commented 3 years ago

Hi,

I replaced the multilingual bert model by this model aubmindlab/bert-base-arabertv01 and I also used AutoModel and autotokenizer in ur code

As I said it gave me 0 for ATE and a low accuaracy for APC

Screen Shot 2021-08-06 at 8 18 30 AM

Thank u

yangheng95 commented 3 years ago

I dont have the dataset to debug, did you design the dataset as provided format? I received a similar report which is cuased by mis-annotation and label usage.

yassmine-lam commented 3 years ago

Yes, u were right; there was a problem with the data format. I fixed it, but the accuracy is still very low using the monolingual BERT model compared to the multilingual one.

I really cannot understand that because the monolingual models are generally better than multilingual ones

Do u have any idea plz? thank u

yangheng95 commented 3 years ago

Hi, I suggest you share your code on Github so I can review it. otherwise I might have no idea where the problem comes from.

yassmine-lam commented 3 years ago

Thank u for ur effort to help us fixing errors. I am working on google colab. So I shared with u the notebook and the folder of code (my email address: yasmineamine934@gmail.com) to allow u to reproduce the results.

Thank u again for ur effort.

Astudnew commented 3 years ago

Do you solve the problem?

yangheng95 commented 3 years ago

Hi, Unfortunately, I am working on improving PyABSA, this repo is kind of out of maintance, you can try PyABSA which solve some problem about dataset. Or you can provide me with a cut of your dataset so I can analyze it.

yangheng95 commented 3 years ago

I click the close button accidently, and look forward to your reply.

yassmine-lam commented 3 years ago

@Phd-Student2018 No not yet you?

yangheng95 commented 3 years ago

There is no known error found in your data, maybe you can debug via pycharm, etc. To see what happened in tokenization (I suspect the problem is tokenization, or using incompatible tokenizer and model)